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This  thesis  deals  with  choosing  preconditioning  strategies  to 
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The  hypothetical  multiprocessor  considered  consists  of  p  linearly 
connected  processors.  A  variety  of  popular  preconditioning  strategies 
for  sequential  machines  are  examined.  Numerical  experiments  are 
conducted  and  recommendations  made. ... 
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J_.  Introduction 

My  thesis  deals  with  solving  systems  of  linear  equations 

Ax  =  b,  (1.1) 
where  A  is  a  sparse  symmetric  and  positive  definite  matrix.  Systems  of 
this  type  arise  from  the  discretization  of  second  order  self-adjoint 
elliptic  partial  differential  equations.  Many  direct  and  iterative 
numerical  methods  have  been  developed  for  solving  this  problem;  see  for 
example  [Varg62],  (Wach66],  [Youn71],  [HaYo81]  and  [Birk81].  The  advent 
of  multiprocessor  systems  brings  with  it  the  possibility  of  substantial 
speedup  in  performing  these  types  of  numerical  methods.  This  would 
allow  us  to  examine  problems  that,  until  now,  had  been  too  large  or 
complex  to  be  computationally  feasible.  The  new  multiprocessor  systems 
will  require  that  new  numerical  methods  be  generated  or  that  older 
methods  be  modified  to  take  full  advantage  of  their  potential. 

In  this  paper  I  consider  only  the  conjugate  gradient  method  and 
preconditioning  strategies  that  are  best  suited  for  implementation  on  a 
multiprocessor  system.  The  hypothetical  multiprocessor  that  will  be 
considered  consists  of  p  linearly  connected  processors  as  shown  in 
figure  1.1. 
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Figure  1.1 
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Each  processor  is  assumed  Co  be  capable  of  performing  any  arithmetic 
operation  in  one  time  step,  and  that  it  takes  4,  time  steps  to  transfer 
one  floating  point  number  from  one  processor  to  either  of  its  neighbors. 

For  sequential  machines,  the  problem  of  preconditioning  the 
conjugate  gradient  algorithm  has  been  extensively  studied  in  the 
liturature.  See  for  example  [AxGu80],  [CoG076],  [Eise81],  [Gust78], 
[HaYo81],  [Kers781,  [MantSO],  [MeVo77],  [Munk80 ] ,  [Reid71],  and 
[Reid72] . 
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_2.  Model  Problem 

Consider  the  second  order  self-adjoint  partial  differential 
equations  of  the  form 

“  l£(x>y)§]  "  |r[c(x’y)|r]  +  f(x>y)u  =  «(x>y)  (2-^ 

with  a(x,y)  >  0,  c(x,y)  >  0  and  f(x,y)  >  0;  defined  on  the  unit  square, 

0  <  x,y  <  1;  and  with  boundary  conditions  of  the  form 

au  +  ^  =  Y  (2.2) 

where  ^  Is  the  derivative  normal  to  the  boundary. 

Superimposing  a  square  grid  of  mesh  size  h  »  l/(n+l)  and  using  central 

difference  approximations  to  the  derivatives,  the  problem  converts  to 

2 

solving  a  linear  system  of  equations  of  order  n  .  This  process  is  fully 
derived  and  explained  in  [Varg62].  Handling  boundary  conditions  of  the 
form  (2.2)  where  p  *  0  Is  discussed  in  [MiGr80], 

Under  certain  boundary  conditions,  the  resulting  coefficient 
matrix  A  is  a  positive  definite  M-matrix  [Vors81].  An  M-matrix  is 
defined  such  that  given  matrix  A  =  (a^), 

1)  au  >  0  2)  at  <  0 

3)  A  *  exists  4)  A  *  >0. 

In  Appendix  D,  I  will  describe  the  nature  of  matrix  A  for  each  of  my 
test  problems.  The  structure  of  matrix  A  is  determined  by  the  grid 
point  ordering  scheme.  Appendix  A  shows  examples  of  the  natural,  point 
red/black,  line  red/black  and  2  line  red/black  ordering  schemes  and  the 
resulting  structure  of  the  matrix  A  for  n“6. 


3.  Background 


3.1.  Conjugate  Gradient  Method 


The  Conjugate  Gradient  (CG)  Method  was  developed  by  Hestanes  and 
Stlefel  In  1952.  The  idea  behind  it  Is  to  approximate  the  solution 


vector  x  by 


-  x<°>  +  I  «  - 
j=l  J  J 


where 


xv  Is  an  arbitrary  initial  guess,  the  vectors  v^ 

T 

are  A-conjugate  (le.  v^Av^  =  0  for  j  *  i  )  and 

the  ac.'s  are  chosen  to  minimize  ||x^m^  -  x||, 

J  A 

1  /2 

where  ||z|lA  3  (z,Az)  . 

The  vectors  vul  are  constructed  by  orthogonalizing  the  residual 
J1-1 

(1)  T 

r^  *  b  -  Ax  with  respect  to  v^,  ie.  r^v^  “  0  for  j  >  1.  In  this  way, 

each  Iteration  Is  attempting  to  minimize  the  components  of  the  residual 

r^  along  the  eigenvector  corresponding  to  the  most  extreme  eigenvalue. 

The  residual  then  lies  almost  entirely  In  the  subspace  of  the 

eigenvectors  with  the  remaining  less  extreme  eigenvalues.  The  iteration 
proceeds  as  if  the  most  extreme  eigenvectors  and  eigenvalues  were  not 
present  [Ker378]. 


In  the  absence  of  round-off  errors,  the  CG  method  can  be  considered 
a  direct  method,  in  that  it  will  converge  to  the  true  solution  of  a 
system  of  order  n  in  exactly  n  steps,  due  to  the  orthogonality  of  the 
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vectors  v^.  In  fact,  If  the  nxn  matrix  A  has  only  r  distinct 
eigenvalues,  then  the  method  converges  in  only  r  steps.  Many  times,  the 
relative  error  ||x^-  x||/||x||  will  be  quite  small  even  for  i  «  n. 

Unfortunately,  in  the  presence  of  round-off  errors,  the  orthogonality  of 
the  vectors  v^  can  break  down  and  the  guaranteed  finite  convergence  is 
lost.  It  was  this  breakdown  that  prevented  the  CG  method  from  getting 
much  attention.  It  wasn't  until  1971  that  interest  was  renewed  in  Che 
CG  method.  At  that  time,  Reid  [Reld71]  showed  that  the  CG  algorithm  is 
very  effective  for  handling  large  and  sparse  positive  definite  linear 
systems  as  arise  from  our  model  problem.  Its  cause  was  further  helped 
when  Concus,  Golub  and  O'Leary  [CoG076]  showed  that  it  could  be  used  as 
an  effective  Cool  for  accelerating  the  convergence  of  various  iterative 
methods.  They  pointed  out  that  the  CG  method  possesses  some  very 
attractive  properties: 

1)  doesn't  require  prior  knowledge  of  extreme  eigenvalues  to 
calculate  optimal  convergence  parameters 

2)  takes  advantage  of  the  entire  distribution  of  eigenvalues  of 
matrix  A 


3)  is  optimal  in  the  class  of  all  algorithms  for  which 


:(k+1)  -  x(0)  +  Pk(K)r(0)  where  K 


*  I  -  M_1N,  A  =  M  -  N  is  a 


regular  splitting  and  Pk(£)  is  a  polynomial  of  degree  k,  in 
the  sense  that  it  minimizes  i  Ix^j  -  x||^. 

See  [CoG076]  for  more  details. 
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The  rate  of  convergence  of  the  CG  algorithm  depends  heavily  on  the 
distribution  of  eigenvalues  of  matrix  A.  The  fewer  distinct  eigenvalues 
or  the  more  clustered  the  eigenvalues,  the  quicker  the  convergence. 
Unfortunately,  the  matrices  arising  from  our  model  problem  tend  to  have 
eigenvalue  distributions  that  are  widely  distributed  with  little 
clustering.  As  a  result,  the  CG  algorithm  by  itself  tends  to  do  poorly. 
This  situation  can  be  improved  by  "preconditioning"  matrix  A. 


3._2.  Preconditioning 


The  idea  behind  preconditioning  is  to  obtain  a  matrix  C  such  that  C 
is  positive  definite  and  C  *A  has  a  "better"  eigenvalue  distribution. 
It  is  also  important  to  choose  matrix  C  such  that  solving  a  system 
Cw  »  q  is  as  easy  as  possible.  The  CG  algorithm  is  then  applied  to  the 
new  preconditioned  system 

C_1Ax  =  C_1b. 

This  notation  has  one  problem  in  that  C  *A  may  no  longer  be  symmetric. 
It  is  better  to  consider  the  preconditioned  system 


(C-1/2AC-1/2KC‘'2K) 
(L_1AL~T)(LTx)  - 


-  c-1/2b, 

L-1b, 


where  C 


T 

LL  . 


or 


Obviously  the  best  eigenvalue  distribution  for  C  *A  would  be 
achieved  when  C  *  A,  then  C  *A  ■  I.  This  does  not  help  us  much, 
however,  in  that  solving  a  system  Cw  *  q  is  no  easier  than  solving  the 


original  system.  The  idea  then  is  to  choose  matrix  C  as  close  as 
possible  to  A,  such  that  C  would  have  a  few  extreme  eigenvalues  with 
the  rest  clustered  around  unity,  while  still  requiring  Cw  »  q  be  easy  to 
solve. 

When  matrix  A  is  an  M-matrix,  Meijerink  and  van  der  Vorst  [MeVo77] 
introduced  a  set  of  preconditioning  strategies  based  on  an  incomplete 
factorization  of  matrix  A.  The  idea  is  to  choose  C  ■  LU,  such  that 
matrix  C  resembles  matrix  A,  A  »  C  -  R,  with  L  and  U  almost  as  sparse  as 
matrix  A.  The  sparsity  of  L  and  U  is  controlled  by  forcing  certain 
predetermined  positions  within  L  and  U  to  be  zero.  These  positions  are 
defined  by  a  set  P  of  places  (i,j)  such  that 

PCPn  =  {  <i,j)  I  i*j  l<i <n,  l<j<n  } 
where  P^  contains  all  pairs  of  indices  of 
off-diagonal  matrix  elements. 

When  matrix  A  is  symmetric,  we  add  the  restriction  to  the  set  P  that  if 

(i,j)  e  P  then  so  must  (j,i)  e  P  and  consider  an  incomplete  Cholesky 
T  T 

factorization  (LL  or  LDL  ).  Meijerink  and  van  der  Vorst  proved  that 
if  matrix  A  is  an  M-matrix,  then  this  process  is  stable  and  the 
resulting  factorization 

T  T 

C  -  LL  or  LDL 

is  positive  definite. 

Using  this  set  P  notation,  we  can  describe  most  of  the  basic 
preconditioning  strategies.  On  the  extremes,  we  have  P  »  P^  and  P  -  0 
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which  result  In  diagonal  scaling,  C  ■  diag(A)  and  preconditioning  by 
complete  Cholesky  factorization,  C  =  A,  respectively.  In  between  we 
have 

P*  =  {  (1 » J )  I  A(l,j)-0  } 

which  Is  the  preconditioning  strategy  used  by  the  ICCG(O)  algorithm  of 
Meijerlnk  anti  van  der  Vorst  [MeVo77]. 

When  Matrix  A  Is  positive  definite,  but  not  an  M-matrix,  non¬ 
positive  or  small  diagonal  elements  can  result  during  the  factorization 
process,  causing  matrix  C  to  be  no  longer  positive  definite.  A  number 
of  modifications  have  been  proposed  to  solve  this  problem.  Kershaw 
[Kers78]  recommends  simply  replacing  the  non-positive  diagonal  elements 
by  suitable  positive  numbers.  He  has  found  that  a  few  diagonal  elements 
can  become  non-positive  and  be  so  replaced  without  distracting  from  the 
incomplete  factorization,  as  long  as  most  of  the  pivots  remain  positive. 
Another  approach  is  simply  to  add  aD  to  matrix  A  before  attempting  the 
incomplete  factorization,  where  D  -  diag(A)  and  a  is  a  positive  scalar. 
This  idea  was  proposed  by  Manteuffel  [Mant80]  in  developing  his  shifted 
Incomplete  Cholesky  factorization.  If  a  is  large  enough,  then  the 
factorization  is  guaranteed  to  be  positive  definite.  However,  choosing 
a  too  large  results  in  very  slow  convergence  of  the  resulting  conjugate 
gradient  algorithm.  Unfortunately,  the  only  way  to  determine  a  "good" 
value  of  a  for  a  given  problem  is  through  trial  and  error.  For  the  test 
problems  considered  by  Manteuffel,  good  results  were  achieved  for  a  of 
0(10"2). 
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A  number  of  variations  on  the  incomplete  factorization  idea  of 
Meijerink  and  van  der  Vorst  have  been  proposed.  Gustafsson  [Gust78] 
introduced  the  concept  of  the  modified  incomplete  factorization.  Here 
the  elements  created  during  the  incomplete  factorization  that  correspond 
to  entries  in  the  set  P  are  added  to  the  diagonal  elements  of  matrix  C 
prior  to  being  discarded.  The  process  is  known  as  diagonal 
modification.  The  MICCG(O)  algorithm  results  when  adding  diagonal 
modification  to  the  ICCG(O)  algorithm.  Gustafsson  reports  that  a  faster 
asymptotic  rate  of  convergence  can  be  achieved. 


Another  variation  has  been  proposed  by  Munksgaard  [Munk80 ] .  Here , 
instead  of  dropping  a  predetermined  set  of  elements  P  during  the 
factorization,  he  proposes  developing  criteria  for  dropping  only  the 
"smaller"  fill-ins  while  retaining  the  "larger"  ones.  The  philosophy 
here  is  that  the  number  of  iterations  required  to  reach  a  solution  is 
more  sensitive  to  the  size  of  the  elements  dropped  than  to  the  number 
dropped.  He  suggests  dropping  fill-in  elements  if  their  numeric  value 
relative  to  the  diagonal  elements  of  their  row  and  column  is  less  than  a 
relative  drop  tolerance.  In  the  kC^  pivot  step  we  drop  i^+D  £f 


.  (k+1)  ,,(k)  .(kKl/2 

Uij  1  <  c(dii  djj  >  * 

The  amount  of  fill-in  is  determined  by  the  size  of  c.  If 
zero,  we  obtain  almost  a  complete  factorization,  while  c 
factorization  where  no  fill-ins  are  added  and  L  has  the 
pattern  as  matrix  A. 


c  is  close  to 
=  1  produces  a 
same  sparsity 
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J3 .3^  Preconditioned  Conjugate  Gradient  Method 

Given  a  preconditioning  matrix  C  and  an  initial  guess  x^,  the 
standard  preconditioned  conjugate  gradient  (PCG)  method  can  be  described 
in  the  following  algorithmic  format: 

Algorithm  3.1 


a)  Initial  step 


1) 

ro  “ 

b  -  AxQ 

2) 

Z0  " 

c'lro 

3) 

P0  “ 

zo 

For 

k  -  0, 

1 , 

1) 

“k  “ 

*rk,zk*^pk,Apk* 

2) 

Vl 

"  *k  +  Vk 

3) 

rk+l 

“  rk  -  °kApk 

4) 

Zk+1 

■  c~lrk+l 

5) 

Pk  “ 

^rk+l*  zk+l^^rk,zk^ 

6) 

pk+l 

’  Zk+1  +  Pkpk* 

A  commonly  used  stopping  criterion  for  this  algorithm  is  to  calculate 
1  /2 

llr^ll  -  (r^.r^)  each  iteration  and  stop  when  llr^ll  <  e,  where  e 
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Is  a  user  specified  parameter. 

One  choice  for  the  initial  iterate  x^  is  a  random  vector.  A  more 
creative  approach  is  to  choose  x^  »  C  *b.  This  uses  the  fact  that  if 
matrix  C  is  close  to  matrix  A,  then  x^  »  C  will  be  a  reasonably 
accurate  estimate  for  x  =  A  *b.  Starting  with  a  more  accurate  estimate 
for  x  will  hopefully  reduce  the  number  of  iterations  required  to 
generate  an  answer  of  desired  accuracy. 

T 

The  ICCG(O)  and  MICCG(O)  algorithms  utilize  incomplete  LDL 
factorization  of  the  form 

C  »  (D  +  DdT^D  +  L)T,  (3.1) 

X  ~ 

where  A  =  L  +  D  +  L  ,  L  Is  strictly  lower  triangular  and  D  and  D  are 

positive  diagonal  matrices.  To  define  D,  I  will  use  figure  A5  and  use 

a^,  b^  and  c^  to  denote  the  elements  of  the  main  diagonal,  upper- 

diagonal  and  mth  upper  diagonal  respectively,  where  i  is  the  row  index 

and  m  is  the  half  band  width  of  the  matrix.  Then  D  =  diag(3, , • • • ,2f  )  is 

i  n 

defined  for  ICCG(O)  as: 

a.  -  a.  -  b2.  ,r.\  -  c^  t.1 

i  i  i-1  i-I  i-m  i-m 
(i  ■  1 ,  2,  •  •  • ,  n) 

and  for  MICCG(O)  as: 

a.  ■  a,  -  b^  -  c,  -  r  -  r 

i  i  i-1  i-1  i-m  i-m  i  i-nrt-1 

ri  "  Ci-lbi-iri-l 
(i  *  1,  2,  •  •  • ,  n) 
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where  in  both  cases,  elements  not  defined  (ie.  subscripts  <  0)  should  be 

T 

replaced  by  zeroes.  For  those  algorithms  where  the  incomplete  LDL 
factorization  can  be  described  in  the  form  (3.1),  Eisenstat  [Eise81] 
proposes  a  different  implementation  of  our  standard  PCG  Algorithm  3.1. 
His  method  reduces  the  number  of  multiply-adds  required  per  iteration  by 
a  factor  approaching  one  half.  This  is  done  by  restating  the  original 
problem  (1.1)  in  the  form 

[ (5  +  L)_1A(D  +  L)“T][(D  +  L)Tx]  »  [ (D  +  L)-1b] 

or 


Ax  »  S.  (3.2) 

It  can  then  be  shown  that  applying  PCG  to  (1.1)  with  preconditioning 

(3.1)  is  equivalent  to  applying  PCG  to  (3.2)  with  preconditioning 
~-1  ~  -T 

C  »  D  and  setting  x  =  (D  +  L)  £.  The  algorithm  can  now  be  written 
as : 

Algorithm  3.2 


a)  Initial  step 

1)  t0  -  (D  +  W'CS  -  Axq) 

2>  K  ■  *0  ■  *0 

b)  For  k  -  0,  1,  • • • 
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2) 

*k+l 

■  *k  + 

\(D  +  L)“Tpk 

3) 

*k+l 

”  \  - 

4) 

Zk+1 

5) 

K  - 

(i-k+l,J 

:k+l)/(rk’V 

6) 

K+i 

*  Zk+1 

+  Kh 

To  calculate  Apk,  the  matrix  A  does  not  have  to  be  explicitly 
calculated.  The  product  can  be  computed  efficiently  by  taking  advantage 
of  the  following  identity: 

Apk  -  (D  +  L)-1 [ (D  +  L)  +  (D  +  L)T  -  (2D  -  D) ] (D  +  L)“Tpk 
This  can  be  simplified,  and  results  in  the  following  two  step 
calculation: 

tk  -  (D  +  L)'Tpk 
Apk  -  tk  +  (D  +  L)_1(pk  -  Ktk), 
where  K  5  2D  -  D. 

This  version  requires  8N  +  NZ(A)  multiply-adds,  versus  6N  +  2NZ(A)  for 
Algorithm  3.1,  where  NZ(A)  -  number  of  non-zero  elements  in  matrix  A. 
Another  3N  multiply-adds  can  be  saved  by  symmetrically  scaling  the 
problem  so  that  D  •  I,  [Eise81]. 
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Ruttshauser  considered  a  version  of  the  PCG  algorithm,  where 
r^  are  calculated  using  a  3-term  recurrence  relation.  It 
represented  in  the  following  algorithmic  format: 

Algorithm  3.3 


a)  Initial  step 

1)  choose  initial  guess 

2)  x_t  =»  0 

3)  *  1 

4)  rQ  -  b  -  Axq 

5)  zo  "  c"% 

b)  For  k  ■  0,  1,  ••• 


1}  “k  -  ^k^k^V02^ 

2) 


“k+i  * 1/11  ~  ir-  iz~  v —j  \  ]  (k>1) 

®k-i  uk-l,rk-l; 


(Vrk} 


3)  Vi  "  Vi  +  WVk  +  \  -  Vi> 

4)  rk+l  ■  rk-l  +  <V*-l^”ak^Zk  +  rk  “  ^-l* 


L 


x^  and 
can  be 
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5) 


k+1 


C-1r 


k+1* 


This  version  is  particularly  useful  when  considering  the  conjugate 
gradient  method  as  a  means  of  accelerating  other  iterative  methods,  as 
in  [CoG076J.  In  general,  Reid  [Reid71]  showed  that  this  version 
required  more  storage  to  implement  than  does  our  standard  PCG 
algorithm  3.1. 


When  matrix  A  possesses  "Property  A",  Reid  [Reid72]  showed  how 
algorithm  3.3  could  be  modified  to  reduce  the  amount  of  work  per 
iteration  by  approximately  one  half.  In  general,  the  same  results  can 
be  obtained  if  our  problem  (1.1)  can  be  partitioned  such  that: 


C,  F 

b, 

1 

1 

1 

T 

F  Co 

*2 

b2 

2. 

J 

This  can  also  be  represented  by  the  two  matrix  equations: 


(3.3) 


Vi 


bl  -  Fx2 


(3.4) 


C2X2 


b2  -  F  Xj .  (3.5) 

The  idea  behind  Reid's  modification  is  to  choose  an  initial  guess  x|^ 
and  then  use  it  to  calculate  x^^  via  (3.5).  This  then  implies  that 
and  forces  ctg  *  1,  where  I  assume  z  is  partitioned  in  the  same 
fashion  as  (3.3).  A  simple  inductive  argument  shows  that  for 

j  -  0,  1,  2,  •  •  • 

a  j 


‘j 


1  and  zf2^  -  z<2J>  -  0. 
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As  a  result,  algorithm  3.3  can  be  reduced  to: 

Algorithm  3.4 

a)  Initial  step 


1)  choose  initial  guess  x^ 


(0) 


2) 

n 

i—t 

3 

l 

3) 

40) 

=  C~l(b2  -  FTX 

4) 

t<°> 

=  (bx  -  Fx^0)) 

5) 

2;o) 

■ 

6) 

e<°> 

-  (z<0).r«>) 

For 

k  =*  0, 

>  1.  2,  ••• 

:  x(0) 
11 


1) 


(2k+l)  f.  x-.^k—l ) 

r2  =  (1  "  “2k+l)r2 


(2k) 

U2k+1F  1 


(2k+l)  -1  (2k+l) 

2)  z2  C2  2 


3)  9?k+1)  -  <z<2k+1\r<2k+1)) 


2k+2 


ri  -1  0(2k+l)/.(2k)1-l 

[1  '  u2k+l02  /91  1 


4) 
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5) 


*  ( 2k )  _  r  (2k)  ,  -lWl  -1  v  .  (2k-2), 

4  1  “2k+l“2k+2 f  1  +  (1  w2k)(1  “  w2k+l)Axl  1 


6) 


*(2k+2)  _  x(2k)  +  ^Uk) 


7) 


(2k+2)  N  (2k)  _  (2k+l ) 

rl  *  (1  -  a)2k+2  ri  '  “2k+2F22 


8) 

9) 


0 


(2k+2) 

(2k+2) 

1 


_-l  (2k+2) 
C1  rl 


/  (2k+2)  (2k+2K 

<21  *rl  5 


,0) 


2k+3 
(m) 


u  -  “a+2<e!2k+2)/e22k+1>  J  r1 


c)  Once  Xj“'  has  been  obtained  to  the  desired  accuracy,  calculate 

(m)  . 

x2  using: 


(m) 


C“l(b2  -  FTx^m) ) . 


The  main  advantages  with  this  approach  are  that  the  algorithm  is 
working  with  1/2  the  number  of  unknowns  each  iteration,  and  that 
each  iteration  of  Algorithm  3.4  is  equivalent  to  two  iterations  of 
Algorithm  3.3. 
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A*  Investigative  Process 
A_.  1_.  Introduction 

The  investigation  will  be  divided  into  three  phases.  In  the  first 
phase  I  examine  a  group  of  preconditioning  strategies  arising  from  the 
ideas  of  Meijerink  and  van  der  Vorst,  Gustafsson,  and  Munksgaard,  and 
determine  how  they  compare  to  one  another  on  a  given  set  of  problems. 
The  preconditioning  strategies  will  be  judged  on  how  they  influence  the 
eigenvalue  distribution  of  our  test  matrices,  and  their  effect  on  the 
rate  of  convergence  and  amount  of  work  required  by  a  standard  CG 
algorithm  to  obtain  a  given  relative  error. 

The  second  phase  consists  of  analyzing  each  preconditioning 
strategy  and  determining  which  ones  might  be  easily  adaptable  to  our 
multiprocessor  system.  A  prime  consideration  is  to  identify  those 
preconditioning  strategies  that  minimize  the  total  amount  of  work, 
including  the  amount  of  interprocessor  communication  required  to 
construct  the  preconditioning  matrix  C  and  to  solve  the  systems 
z  -  C-lr. 

From  the  results  of  the  first  two  phases,  I  will  narrow  the  list  of 
possible  strategics  to  two  or  three  prime  candidates  for  preconditioning 
on  multiprocessors.  The  third  phase  then  consists  of  analyzing  the 
effects  of  these  strategies  on  larger  and  more  complex  test  problems.  I 
will  also  examine  what  effect  various  values  of  <J>,  the  Interprocessor 
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communications  cost  parameter,  might  have  on  our  choice  of  a 
preconditioning  strategy.  The  numerical  experiments  required  during 
Phase  I  and  Phase  III  will  be  conducted  on  the  CDC-Cyber  175  at  the 
University  of  Illinois,  for  which  the  arithmetic  precision  is  roughly 
14  decimal  digits. 

4^.2^.  Software 

In  conducting  the  numerical  experiments,  I  relied  heavily  on  the 
Harwell  sparse  matrix  routines  MA31  and  EA14A.  The  MA31  package  served 
as  the  basis  for  the  incomplete  factorization  and  conjugate  gradient 
routines.  A  complete  description  of  these  routines  can  be  found  in 
[Munk80].  The  program  listings  and  on-line  write-ups  for  the  MA31 
package  are  available  in  the  Cyber  Harwell  library  under  the  name  MA31A. 

The  conjugate  gradient  routine  MA31F,  contained  in  this  package, 
was  slightly  modified.  Originally,  it  chose  as  its  initial  guess 

Xq  =  C~lb. 

In  order  to  make  it  more  difficult  for  the  algorithm  and  to  get  a  better 
idea  of  how  the  preconditioning  would  effect  convergence,  I  replaced  b 
by  a  vector  with  random  entries  between  0  and  2. 

The  eigenvalues  of  our  symmetrically  preconditioned  matrices  were 
found  using  a  Lanczos  algorithm  as  implemented  in  the  Harwell  routine 
EA14A.  This  algorithm  finds  the  eigenvalues  without  regard  to  their 
multiplicity.  A  complete  description  of  this  routine  can  be  found  in 
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[PaRe81].  The  only  modification  made  was  to  replace  the  Harwell  random 
number  function  FACIAS  by  the  CDC  Fortran  function  RANF.  The  complete 
program  listings  and  write-ups  for  this  routine  should  be  available 
shortly  in  the  Cyber  Harwell  library. 

This  routine  requires  that  the  user  supply  the  necessary  code  to 
calculate  u  *  u  +  Av  each  iteration,  where  the  subroutine  EA14A  supplies 
the  vectors  u  and  v.  Since  we  are  working  with  a  symmetrically 
preconditioned  matrix  A,  we  actually  need  to  calculate 

u  =  u  +  L  ^AL  *v  (4.1) 

T 

where  C  *  LL  . 

This  was  done  using  the  Harwell  subroutines  Ma31G  and  MA31H.  The 
subroutine  MA31G  solves  the  system 


x  =  (LLT)  1y 

using  backward  and  forward  substitution.  1  broke  this  into  two  separate 
subroutines;  MA31G1  to  do  the  backward  substitution,  and  MA31G2  to  do 
the  forward  substitution.  The  subroutine  MA31H  is  used  to  calculate 
Ax  =  y.  Using  these  three  routines,  we  can  solve  equation  (4.1)  in  the 
following  four  steps: 

1)  solve  tj  *  L  *v  using  MA31G2 

2)  calculate  t^  a  At^  using  MA31H 
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3)  solve  t^  ■  L  Tt2  using  MA31G1 

4)  calculate  u  “  u  +  t^. 

Appendix  F  contains  source  listings  for  the  programs  I  created,  and 
those  Harwell  routines  which  I  modified. 


4.3.  Preconditioning  Strategies 

The  following  Is  a  list  of  abbreviations  and  descriptions  of  the 
preconditioning  strategies  that  I  have  examined. 

1)  DS  -  Diagonal  Scaling 

This  method  uses  C  ■  dlag(A)  as  its  preconditioning  matrix. 

2)  BDS  -  Block  Diagonal  Scaling 

Similar  to  diagonal  scaling,  this  method  uses  C  *  block  diag(A), 
where  each  principle  submatrix  Is  trl-diagonal. 

3)  IC(s)  -  Incomplete  Cholesky  factorization  with  s  diagonals  added 
This  technique  was  developed  by  Meijerink  and  van  der  Vorst 
[MeVo77].  It  is  normally  associated  with  matrices  generated  using 
the  natural  grid  point  ordering  scheme.  The  case  when  no  fill-ins 
are  kept  during  the  factorization  (s*0),  can  easily  be  generalized 
for  matrices  using  other  grid  point  ordering  schemes.  Here  I  will 
limit  myself  to  the  cases  s  *  0,  1  and  3.  They  utilize  set  P's  of 
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the  form: 

P°  =  Ui.j)  I  A( i , j )  -  0  } 

P1  s  I  li-jl  *  0, 1 } 

P^  =  {(i.j)  I  I  i~  j  I  *  0, l,2,m-2,m-l,m  } 
where  m  is  the  half  band  width  of  the  outer  diagonal. 

4)  MIC(s)  -  Modified  Incomplete  Cholesky  factorization  with  s 
diagonals  added 

Developed  by  Gustafsson  [Gust78],  it  represents  an  extension  of  the 
IC(s)  algorithm  to  include  diagonal  modification. 

5)  HARWELL(c)  -  Harwell  package  MA31  with  drop  tolerance  c 

This  performs  the  incomplete  Cholesky  factorization  as  proposed  by 

Munksgaard  [Munk80]  and  implemented  by  the  Harwell  routine  MA31C. 

It  uses  a  numeric  drop  tolerance  to  control  fill-ins,  and  includes 

diagonal  modification.  It  also  incorporates  minimum  degree 

pivoting  to  minimize  the  number  of  potential  fill-ins  generated.  I 

-2 

will  limit  myself  to  the  two  cases  c=0  and  c=10  .  The  case  c=0 

generates  a  complete  Cholesky  factorization. 

6)  MICD(c)  -  Modified  Incomplete  Cholesky  factorization  with  Drop 


tolerance  c 

Similar  to  the  HARWELL(c)  algorithm,  in  this  case  the  minimum 
degree  pivoting  has  been  eliminated. 
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7)  RBIC(s)  -  Reduced  Block  Incomplete  Choleaky  factorization  with  a 
diagonals  added 

This  ia  similar  to  the  ICC3)  algorithm,  except  that  only  portions 
of  matrix  A  are  used  to  calculate  the  incomplete  Choleaky 
factorization.  Parts  of  matrix  A  are  ignored  in  order  to  break 
matrix  A  into  n/2  uncoupled  systems  of  equations.  The  incomplete 
Cholesky  factorization  on  each  system  can  then  be  performed 
independently.  Using  the  notation  in  Appendix  B,  the  following  is 
how  the  matrices  arising  from  the  various  grid  point  ordering 
schemes  will  be  partitioned: 

a)  Point  Red/Black  Ordering  (Figure  B4) 

The  elements  in  blocks  ^ ,  E* ,  Ft  and  F^  (i  -  l,...,(y  -  1) 

will  be  ignored  during  the  factorization. 

b)  Line  Red/Black  Ordering  (Figure  B2) 

The  elements  in  blocks  £2^  (i  “  l,***»(y  “  D)  will  be 

ignored. 

c)  2  Line  Red/Black  Ordering  (Figure  B3) 

The  elements  in  blocks  £2^  “  1»**,>('^“D)  will  be 

ignored. 

1  will  limit  myself  to  the  case  s-0,  except  when  working  with  the 

2  Line  Red/Black  matrices,  where  I  will  also  examine  the  case  s«»3. 
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Each  of  these  preconditioning  strategies  was  not  necessarily  matched 
with  each  of  the  grid  point  ordering  schemes.  Table  4.1  shows  which 
combinations  were  examined. 


Preconditioning 

Grid  Point 

Ordering  Schemes 

Strategy 

Natural 

Point  R/B 

Line  R/B  2- 

line  R/B 

DS 

X 

X 

BDS 

X 

X 

X 

IC(0) 

X 

X 

X 

X 

IC(1) 

X 

IC(3) 

X 

MIC(O)  , 

MICD(10  ) 

X 

X 

X 

X 

X 

X 

X 

X 

MICD(O)  „ 

HARWELL ( 1 0*; 

X 

X 

X 

X 

X 

X 

X 

X 

HARWELL (0) 

X 

X 

X 

X 

RBIC(O) 

X 

X 

X 

RBIC(3) 

X 

Table  4.1 


1 

4 

.  i 

'1 

I 


i 


1 


4^4_._1_.  Introduction 

During  this  phase,  I  was  interested  in  determining  how  the  various 
chosen  preconditioning  strategies  compare  to  one  another.  I  limited 
myself  to  comparing  them  relative  to  test  matrices  of  order  64  arising 
from  test  problem  1  with  n=8  (see  appendix  D).  Appendix  C  outlines 
which  combinations  of  preconditioning  strategies  and  grid  point  ordering 
schemes  I  looked  at . 

A  prime  consideration  when  choosing  an  algorithm  for  this  type  of 
problem  is  the  amount  of  work  required  to  generate  an  acceptable  answer. 
Keeping  this  in  mind,  I  determined  the  amount  of  time  and  number  of 
iterations  required  by  our  PCG  algorithm  to  produce  an  answer  such  that 

1 1  r 1 1 J  <  10-6 
where  r^  =»  Ax^  -  b. 

This  was  subdivided  into  the  time  required  to  compute  the 
preconditioning  matrix  and  that  required  to  actually  perform  the  PCG 
iterations. 

Another  means  of  comparing  preconditioning  strategies  is  to  examine 
their  effect  on  the  eigenvalue  distribution  of  the  test  matrices. 
Ideally,  the  eigenvalues  of  the  symmetrically  preconditioned  test 
matrices  should  be  clustered  around  one.  In  an  effort  to  gauge  this,  I 
used  the  Harwell  routine  EA14A  to  calculate  all  the  distinct  eigenvalues 
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(\^)  of  the  symmetrically  preconditioned  matrices  to  an  accuracy  of 
-4 

10  .  I  then  calculated  the  range,  mean  and  standard  deviation  of 
c X±  -  1.0).  The  more  successful  the  strategy,  the  closer  these  values 
will  be  to  zero. 

The  conclusions  reached  during  this  phase  are  not  necessarily 
intended  to  hold  for  larger  and  more  complex  problems.  A  much  wider 
variety  and  size  of  test  problems  would  have  to  have  been  considered. 
Such  a  comprehensive  study  is  beyond  the  scope  of  this  paper.  More 
exhaustive  studies  comparing  various  subsets  of  these  preconditioning 
strategies  with  respect  to  sequential  machines  only  can  be  found  in 
[MeVo77],  [Gust78] ,  and  [Munk80]. 


4^.2^  Software 

A  modified  version  of  the  Harwell  incomplete  factorization  routine 
MA31C  will  be  used  to  generate  all  the  various  types  of  factorizations 
required  during  this  phase.  As  written,  it  performed  the  incomplete 
factorization  using  a  numeric  drop  tolerance,  diagonal  modification,  and 
minimum  degree  pivoting.  To  allow  the  routine  to  handle  a  wider  variety 
of  factorizations,  I  made  the  minimum  degree  pivoting  and  diagonal 
modification  user  controlled  options.  I  also  allowed  the  user  to  choose 
either  a  numeric  drop  tolerance  or  a  user  defined  function  FILL  to 
control  fill-ins.  The  function  FILL  would  decide  if  a  zero  should  be 
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destroyed  by  considering  only  Its  coordinates,  and  would  be  similar  in 
nature  to  the  set  P  of  Meijerink  and  van  der  Vorst  [MeVo77]. 

The  routine  MA31A,  used  to  activate  MA31C,  was  also  changed.  It 
had  been  used  to  prepare  the  data  structures  required  during  the 
incomplete  factorization.  Its  duties  were  taken  over  by  my  routine 
FACTOR.  Eliminated  was  the  automatic  diagonal  scaling  of  matrix  A. 
This  necessitated  a  change  to  another  Harwell  routine  MA31H,  used  to 
calculate  Ax  *  y.  No  longer  does  this  routine  assume  Diag(A)  *  I.  I 
also  added  to  FACTOR  an  option  to  allow  the  user  to  specify  which 
portions  of  matrix  A  would  be*  used  in  calculating  the  incomplete 
factorization.  This  was  done  using  a  user  defined  function  EUSE  which, 
when  activated,  identifies  which  portion  of  matrix  A  is  to  be  passed  on 
to  subroutine  MA31C. 

It  should  be  noted  that,  while  these  modifications  do  allow  a 
greater  variety  of  preconditioning  strategies  to  be  implemented,  the 
process  at  times  is  far  from  efficient.  As  a  result,  the  time  required 
to  perform  some  of  the  incomplete  factorizations  will  be  inflated.  This 
is  especially  true  for  the  IC(s)  and  RBIC(s)  factorizations.  Normally, 
the  locations  of  the  non-zero  entries  In  the  factorization  are  known 
beforehand,  and  only  those  values  need  be  calculated.  Here,  most  of  the 
work  required  to  generate  a  potential  fill-in  is  done  before  the  program 
decides  to  keep  it  or  not.  This  results  in  more  values  being  calculated 


than  need  be 


The  execution  time  of  the  factorization  (MA31C)  and  the 
preconditioned  conjugate  gradient  (MA31F)  routines  will  be  determined 
using  the  CDC  Fortran  function  SECOND.  This  function  returns  the 
central  processor  time  from  start-of-job  in  seconds.  The  difference 
between  the  values  recorded  at  the  start  and  end  of  a  routine  will  be 
its  execution  time.  The  values  returned  by  function  SECOND  are  usually 
accurate  to  two  decimal  places. 

The  statistics  on  the  calculated  eigenvalues  will  be  generated 
using  the  CDC  Math/Science  Library  routines  DSCRPT  and  DSCRP2.  The 
source  code  for  both  routines  is  in  the  Cyber  MSL  Library. 

4^j4. 3_.  Results 

The  results  of  the  numerical  experiments  have  been  tabulated  and 
placed  in  Tables  4.2  -  4.9  and  Graphs  4.1  -  4.5  at  the  end  of  this 
section.  First,  I  will  discuss  some  general  observations  about  the 
data.  I  will  then  look  at  each  preconditioning  strategy  separately, 
discuss  how  it  relates  to  the  other  preconditioning  strategies,  and  what 
effect  the  different  grid  point  ordering  schemes  may  have  had  upon  it. 

There  exists  a  definite  correlation  between  the  number  of 
iterations  required  to  solve  the  preconditioned  system  and  the  size  of 
the  spectral  radius,  and  the  range  and  standard  deviation  of  the 
resulting  eigenvalues.  The  smaller  the  spectral  radius,  the  range,  and 
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the  standard  deviation  of  the  eigenvalues,  the  fewer  the  number  of 
iterations.  In  most  cases,  the  mean  is  also  reasonably  close  to  zero. 
This  supports  the  idea  that  the  closer  the  eigenvalues  of  the 
symmetrically  preconditioned  matrix  are  clustered  around  one,  the  faster 
the  method  will  converge.  Such  observations,  however,  did  not  hold  for 
the  MIC(O)  preconditioning  strategy.  Unfortunately,  I  have  not  been 
able  to  explain  why.  From  this  data,  it  is  clear  that  the  distribution, 
rather  than  the  number  of  distinct  eigenvalues  is  the  characteristic 
relevant  to  the  rate  of  convergence.  In  fact,  the  non-preconditioned 
matrix  is  the  matrix  with  the  fewest  distinct  eigenvalues. 

As  a  result  of  the  relatively  small  size  of  our  test  system,  the 

times  consumed  by  the  various  preconditioned  C.G.  algorithms  are 

clustered  together.  If  any  method  could  be  classified  as  the  fastest, 
-2 

the  Harwell(10  )  would  probably  be  the  one.  It  registered  a  time  of 
0.03  second  when  matched  with  the  point  red/black  matrix  and  the  2-line 
red/black  matrix.  From  this  data  alone  however,  it  is  difficult  to 
conclude  whether  the  difference  in  times  resulting  from  the  various 

ordering  schemes  is  significant.  When  the  Harwell(10  )  method  is 

_2 

compared  to  its  sister  method  MICD(10  ),  the  benefits  of  minimum  degree 

_2 

pivoting  (Harwell(10  ))  are  clearly  evident.  In  each  case,  the 
_2 

Harwell(10  )  method  produced  better  results  in  every  category  than  did 
-2  -2 

the  MICD(10  )  method.  The  MICD(10  )  method  also  proved  extremely 
sensitive  to  the  type  of  ordering  scheme  used. 
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Had  the  IC(n)  methods  been  more  efficiently  implemented,  they  would 

_2 

have  matched  the  efficiency  of  the  Harwell(10  )  method.  That  aside, 

they  were  still  very  competitive.  The  IC(0)  method  proved  to  be  a 

substantial  improvement  over  the  BDS  method  in  all  areas.  The  results 

for  the  IC(0)  method  fluctuate  slightly  depending  on  the  grid  point 

ordering  scheme  used.  It  is  unclear  whether  or  not  these  changes  are 

significant.  Additional  tests  would  have  to  be  conducted.  The  IC(1) 

method  made  modest  additional  improvements  to  both  the  eigenvalue 

distribution  and  rate  of  convergence.  The  timing  data  between  the  IC(0) 

and  IC(1)  methods  is  so  close,  that  it  is  impossible  to  tell  which  is 

more  efficient.  The  IC(3)  method,  on  the  other  hand,  while  making 

additional  improvements  on  the  eigenvalue  distribution,  did  not  improve 

the  rate  of  convergence  enough  to  outweigh  the  increased  cost  of  the 

factorization.  As  a  result,  it  is  less  desirable  than  the  IC(0)  or 

IC Cl)  methods.  However,  the  results  of  Meijerink  and  van  der  Vorst 

[MeVo77]  show  that  for  larger  systems,  the  IC(3)  method  is  indeed 

_2 

superior.  How  the  IC(3)  method  would  compare  to  the  Harwell(10  ) 
method  on  larger  systems  has,  to  my  knowledge,  not  been  thoroughly 
explored. 

The  MIC(O)  method  proved  extremely  sensitive  to  the  type  of  grid 
point  ordering  scheme  used.  It  had  the  most  trouble  with  the  point 
red/black  matrix.  Here  the  process  became  unstable  and  six  diagonal 
elements  had  to  be  changed,  using  Kershaw's  technique  [Kers78],  to  keep 
the  factorization  positive  definite.  On  the  other  hand,  with  a 
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naturally  ordered  matrix,  It  seemed  to  be  fairly  competitive  as  far  as 
the  time  required  to  obtain  an  answer.  The  eigenvalue  distribution, 
however,  suffered  as  compared  to  the  IC(n)  methods.  While  the  data  here 
indicates  the  MIC(O)  method  slightly  inferior  to  the  IC(0)  method, 
Gustafsson  [Gust78],  using  naturally  ordered  matrices,  showed  that  for 
larger  systems  the  MIC(n)  methods  required  fewer  iterations  than  the 
corresponding  IC(n)  methods. 

The  Harwell(O)  method  seems  to  be  surprisingly  competitive, 
considering  that  it  represents  a  complete  factorization.  However,  the 
results  of  Munksgaard  [Munk80]  show  that,  as  would  be  expected,  this 
competitiveness  does  not  extend  to  larger  systems.  When  compared  to  the 
other  complete  factorization  method,  MICD(O),  the  benefits  of  minimum 
degree  pivoting  are  again  clearly  evident.  In  each  case,  the  Harwell(O) 
method  produced  fewer  fill-ins  and  required  substantially  less  time  to 
perform  the  factorization.  Another  interesting  observation  is  that  the 
Harwell(O)  method  was  not  influenced  by  the  grid  point  ordering  scheme 
used,  while  the  MICD(O)  was  definitely  sensitive  to  the  ordering  scheme 
used.  For  the  MICD(O)  method,  the  number  of  elements  in  the  lower 
triangular  part  of  the  factorization  varied  from  584  to  326.  This  was 
reflected  in  the  time  required  to  calculate  the  factorization,  which 
varied  accordingly. 

The  DS  and  BDS  methods  were  somewhat  disappointing.  The  DS  method, 
while  improving  the  eigenvalue  distribution  tremendously,  did  little  to 


improve  the  rate  of  convergence  associated  with  our  conjugate  gradient 
routine.  The  BDS  method  was  equally  ineffectual.  It  produced  almost  no 
improvement  in  the  eigenvalue  distribution  over  the  DS  method,  and  only 
a  modest  improvement  in  the  rate  of  convergence.  Unfortunately,  this 
improvement  in  the  rate  of  convergence  was  overshadowed  by  the  cost  of 
the  factorization.  As  we  will  see  in  section  4.6.3,  the  BDS  method  is 
not  as  worthless  as  these  results  would  indicate. 

The  RBIC(O)  method  proved  to  be  reasonably  successful.  Where  they 
could  be  compared,  its  results  fall  almost  exactly  half  way  between 
those  of  the  BDS  and  IC(O)  methods.  Only  minor  fluctuations  in  results 
occurred  between  the  various  grid  point  ordering  schemes.  The  RBIC(3) 
method,  on  the  other  hand,  proved  to  be  a  major  disappointment.  In 
every  category,  it  was  inferior  to  the  RBIC(O)  method.  It  is  unclear 
whether  these  results  are  characteristic  of  the  RBIC(3)  method,  or 
simply  a  consequence  of  the  size  of  the  test  problem. 
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Natural  Ordering 


FACTOR 

SOLVE 

Preconditioning 

Number  of 

Number 

Total 

Method 

Elements 

Time 

of 

Time 

Time 

in  L 

Iterations 

None 

0 

0.0 

24 

0.05 

0.05 

DS 

0 

0.0 

24 

0.05 

0.05 

BDS 

56 

0.01* 

20 

0.05 

0.06 

IC(0) 

112 

0.01* 

10 

0.03 

0.04 

IC(1) 

161 

0.02* 

7 

0.02 

0.04 

IC(3) 

245 

0.03* 

6 

0.02 

0.05 

MIC(O) 

112 

0.01 

1 1 

0.03 

0.04 

MICD(10"  ) 

249 

0.03 

6 

0.02 

0.05 

MICD(O)  - 

455 

0.07 

1 

0.01 

0.08 

HARWELL(10  ) 

210 

0.03 

4 

0.01 

0.04 

HARWELL (0) 

290 

0.03 

1 

0.01 

0.04 

Table  4.2  -  Timing  and 

convergence 

data 

resulting  from 

solving  the 

problem. 


GETEIG 


Preconditioning 

Spectral 

Number 
of  Distinct 

Statistics  on 

(xi  -  1.0) 

Method 

Radius 

Eigenvalues 

Range 

Mean 

Std.  Dev. 

None 

8.876 

33 

7.518 

3.00 

2.059 

DS 

2.219 

33 

1.879 

-0.26E- 

6  0.515 

BDS 

2.377 

64 

1.773 

-0.32E- 

6  0.399 

IC(0) 

1.329 

54 

0.858 

-0.0219 

0.164 

IC(1) 

1.205 

55 

0.512 

-0.0065 

0.074 

IC(3) 

1.108 

45 

0.245 

-0.0029 

0.036 

MIC(0) 

2.960 

49 

1.499 

0.361 

0.347 

MICD(10  ) 

1.220 

57 

0.169 

0.025 

0.033 

MICD(0) 

1.00 

1 

0.0 

0.0 

0.0 

HARWELL (10  ) 

1.076 

38 

0.057 

0.013 

0.014 

HARWELL (0) 

1.00 

1 

0.0 

0.0 

0.0 

Table  4.3  -  Data  on  the  eigenvalue  distribution 
preconditioned  test  matrix. 

of  the 

symmetrically 
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Point  Red/Black  Ordering 


FACTOR 

SOLVE 

Preconditioning 

Number  of 

Number 

Method 

Elements 

Time 

of 

Time 

in  L 

Iterations 

DS 

0 

0.0 

24 

0.05 

RBIC(0) 

88 

0.01* 

17 

0.04 

IC(O) 

112 

0.01* 

13 

0.03 

MIC(0)  ? 

112 

0.01* 

24 

0.06 

MICD(10  ) 

241 

0.03 

5 

0.02 

MI CD (0 ) 

326 

0.04 

1 

0.01 

HARWELL(10  ) 

211 

0.02 

4 

0.01 

HARWELL (0) 

290 

0.03 

1 

0.01 

Table  4.4  -  Timing  and 

convergence 

data 

resulting  from 

solving 

problem. 


GETEIG 


Preconditioning 

Spectral 

Number 
of  Distinct 

Statistics  on  (\ 

Method 

Radius 

Eigenvalues 

Range 

Mean 

DS 

2.194 

33 

1.879 

0.16E-6 

RBIC(0) 

1.660 

52 

1.333 

-0.004 

IC(0) 

1.438 

31 

1.172 

0.033 

MIC(0) 

18.827 

32 

13.951 

3.821 

MICD(10  ) 

1.129 

31 

0.098 

0.019 

MICD(0)  _ 

1.00 

1 

0.0 

0.0 

HARWELLdO  ) 

1.076 

34 

0.063 

0.015 

HARWELL (0) 

1.00 

1 

0.0 

0.0 

Total 

Time 


0.05 

0.05 

0.04 

0.07 

0.05 

0.05 

0.03 

0.04 

the  test 


-  1.0) 

Std.  Dev. 

0.515 

0.346 

0.292 

4.539 

0.021 

0.0 

0.016 

0.0 


Table  4.5  -  Data  on  the  eigenvalue  distribution  of  the  symmetrically 
preconditioned  test  matrix. 
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Line  Red/Black  Ordering 


FACTOR 

SOLVE 

Preconditioning 

Total 

Number  of 

Number 

Method 

Elements 

Time 

of 

Time 

Time 

in  L 

Iterations 

BDS 

56 

0.01* 

20 

0.05 

0.06 

RBIC(O) 

88 

0.01* 

16 

0.04 

0.05 

IC(0) 

112 

0.01* 

11 

0.03 

0.04 

MIC(O)  , 

112 

0.01 

14 

0.04 

0.05 

MICD(10  ) 

301 

0.04 

6 

0.02 

0.06 

MICD(O)  2 

584 

0.09 

1 

0.01 

0.10 

HARWELL(10"  ) 

219 

0.03 

4 

0.01 

0.04 

HARWELL (0) 

290 

0.03 

1 

0.01 

0.04 

Table  4.6  -  Timing  and 

convergence 

data 

resulting  from 

solving 

the 

problem. 


GETEIG 


Preconditioning 

Spectral 

Number 
of  Distinct 

Statistics  on 

O 

• 

f-l 

r< 

Method 

Radius 

Eigenvalues 

Range 

Mean 

Std.  Dev, 

BDS 

2.180 

64 

1.773 

-0.150 

0.399 

RBIC(0) 

1.722 

56 

1.363 

-0.0047 

0.321 

IC(0  ) 

1.406 

61 

1.000 

-0.013 

0.173 

MIC(Q)  2 

13.192 

53 

9.174 

0.653 

1.359 

MICDdO"  ; 

1.377 

52 

0.288 

0.045 

0.059 

MICD(0) 

1.00 

1 

0.0 

0.0 

0.0 

HARWELLdO  ) 

1.072 

33 

0.056 

0.013 

0.014 

HARWELL (0) 

1.00 

1 

0.0 

0.0 

0.0 

Table  4.7  -  Data  on  the  eigenvalue  distribution  of  the  symmetrically 
preconditioned  test  matrix. 
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2-Llne  Red/Black  Ordering 

FACTOR  SOLVE 


Preconditioning 

Number  of 

Number 

Total 

Method 

Elements 

Time 

of 

Time 

Time 

in  L 

Iterations 

BDS 

56 

0.01* 

20 

0.05 

0.06 

RBIC(O) 

88 

0.01* 

15 

0.04 

0.05 

RBIC(3 ) 

164 

0.02* 

16 

0.05 

0.07 

IC(0) 

112 

0.01* 

11 

0.03 

0.04 

MIC(O) 

112 

0.01 

12 

0.03 

0.04 

MICD(10  ) 

280 

0.03 

7 

0.03 

0.06 

MICD(O) 

562 

0.09 

1 

0.01 

0.10 

HARWELLdO  ) 

208 

0.02 

4 

0.01 

0.03 

HARWELL (0) 

290 

0.03 

1 

0.01 

0.04 

Table  4.8  -  Timing  and  convergence  data  resulting  from  solving  the  test 
problem. 


GETEIG 


Preconditioning 

Spectral 

Number 
of  Distinct 

Statistics  on  (\^ 

-  1.0) 

Method 

Radius 

Eigenvalues 

Range 

Mean 

Std.Dev 

BDS 

2.196 

64 

2.773 

-0.12E-6 

0.399 

RBIC(0) 

1.770 

55 

1.363 

-0.004 

0.324 

RBIC(3 ) 

2.052 

49 

1.575 

-0.128E-3 

0.349 

IC(0) 

1.319 

60 

0.904 

*0.015 

0.163 

MIC(0) 

5.449 

51 

3.341 

0.440 

0.562 

MICD(10  ) 

1.484 

53 

0.376 

0.046 

0.072 

MICD(0)  - 

1.00 

1 

0.0 

0.0 

0.0 

HARWELLdO  ) 

1.073 

34 

0.058 

0.014 

0.015 

HARWELL (0) 

1.00 

1 

0.0 

0.0 

0.0 

Table  4.9  -  Data  on  the  eigenvalue  distribution  of  the  symmetrically 
preconditioned  test  matrix. 


*  -  Tests  using  the  routines  from  Phase  III  show  that  these  values  could 
be  reduced  by  up  to  a  factor  of  3  if  the  corresponding  preconditioning 
strategy  had  been  efficiently  implemented. 
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Point  Red/Black  Ordering 


ITERATION 

Graph  4.3  -  Shows  the  log  base  10  of  the  residual  norm  as  a  function  of 
the  iteration  number. 
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Line  Red/Black  Ordering 


Graph  4.4  -  Shows  the  log  base  10  of  the  residual  norm  as  a  function  of 
the  iteration  number. 
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A*JL*  phase  n 

4^  ju  1_.  Introduction 

During  this  phase,  I  attempted  to  analyze  each  of  the 
preconditioning  strategies  and  determine  how  easily  they  could  be 
adapted  to  our  multiprocessor.  I  assumed  that  matrix  A  is  of  order  nm 
and  that  my  multiprocessor  consisted  of  n/2  processors  (p-n/2).  Under 
these  assumptions.  Table  El  (Appendix  E)  shows  the  steps  involved  in 
solving  a  system  of  equations  using  our  preconditioned  conjugate 
gradient  algorithm.  Also  included  are  their  relative  cost  in  arithmetic 
operations  and  the  amount  of  data  that  must  be  passed  between 
processors.  Where  appropriate,  the  relative  cost  of  performing  a 
particular  factorization  was  determined,  as  well  as  the  cost  of  using  it 
to  solve  the  system  of  equations  z  a  C  r. 

As  a  matter  of  terminology,  I  assumed  that  each  factorization 
produced  a  preconditioning  matrix  of  the  form 

'"wv«rw*T 

C  -  LDL 

where  L  is  a  unit  lower  triangular  matrix  and 
D  is  a  positive  diagonal  matrix. 

The  system  z  -  C_1r  was  solved  using  forward  and  backward  substitution 
in  the  following  manner: 

Lt  ■  r 


~T  ~-l 
L  z  “  D  t. 
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In  attempting  to  analyze  each  of  these  events,  I  relied  heavily  on  the 
notation  defined  in  figures  B1  -  B4  of  Appendix  B.  Furthermore,  I 
assume  that  vectors  x,  b,  r,  z,  and  t,  and  matrices  L  and  D  are 
partitioned  in  the  same  manner  as  matrix  A.  Also,  if  matrix  A  contains 
a  block  and  an  element  a£j»  the  and  a^j  represent  the 
corresponding  block  and  element  in  L,  respectively.  Figure  4.1  shows 
which  blocks  of  matrix  A,  of  the  unknown  vector  x,  and  of  the  right-hand 
side  vector  b  are  stored  in  processor  i  (i  *  2,  •••,$■  -  1)  for  each  of 
Che  grid  point  ordering  schemes.  For  processor  1  and  n/2,  the  storage 
requirements  are  slightly  different,  in  that  certain  blocks  mentioned  in 
figure  4.1  are  undefined.  Processor  i  is  used  to  calculate  and  store 
the  portions  of  vectors  r,  z,  and  t,  and  matrix  D  corresponding  to  those 
portions  of  vector  x  referred  to  in  figure  4.1,  as  well  as  portions  of  L 
Chat  correspond  to  those  blocks  of  matrix  A  cited  in  figure  4.1. 
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Data  Initially  stored  in  Processor  i  for  a 


Line 

Point 

2 -Line 

Naturally 

Red/Black 

Red/Black 

Red/Black 

Ordered 

Ordered 

Ordered 

Ordered 

Matrix 

Matrix 

Matrix 

Matrix 

T 

i2i-l 

T 

i2i-l 

Di 

T2i 

T2i 

°i+n/2 

Fj  J- 

2  [i/2j 

E2i-2 

E2i-2 

Bi 

Ck  k  = 

2  fi/2 1 

E2i-1 

E2i-1 

Ei-1 

*2i-l 

E2i 

E2i 

Ei 

X2i 

X2i-1 

^i-l 

Fi-1 

b2i-l 

b2i-l 

b2i 

*2i 

b2i-l 

b2i 

Fi 

Xi 

Xl+n/2 

bi 

bi+n/2 

b2i 

Figure  4.1 
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4^.  5^._2 .  Results 

First,  I  looked  at  those  preconditioning  strategies  whose 
suitability  for  our  multiprocessor  is  not  influenced  by  the  grid  point 
ordering  scheme  used.  These  include  the  DS,  BDS,  Harwell(c)  and  RBIC(n) 
methods.  The  Harwell(c)  method  is  the  only  one  from  this  group  that 
would  be  extremely  difficult  to  implement.  The  minimum  degree  pivoting 
would  require  exorbitant  amounts  of  interprocessor  communications. 

The  remaining  three  methods  from  this  group  can  all  be  easily  • 

adapted  to  our  multiprocessor.  The  DS  is  by  far  the  simplest.  No  work 
is  required  during  the  factorization  phase,  with  L  *  I  and  D  =  diag(A). 

Solving  the  system  z  *  C  *r  is  simply  a  matter  of  calculating 
z^  =  ^i^ri*  wllich  can  be  done  in  m  arithmetic  operations  with  no 
interprocessor  communications  required.  Each  processor  would  solve  two 
such  systems  for  a  total  of  2m  arithmetic  operations. 

The  BDS  method  is  equally  simple.  Here,  processor  i  is  required  to 
perform  the  factorization  of  two  uncoupled  tri-diagonal  matrices 
and  This  will  require  ~6m  arithmetic  operations  per  processor. 

Solving  the  system  z  =  C  *r  is  equivalent  to  solving  n  uncoupled  systems 
of  the  form  »  T^r^  =  l»  n).  Each  processor  will  then  solve 

two  of  these  systems,  requiring  a  total  of  ~10m  arithmetic  operations 
and  no  interprocessor  data  transfers. 

The  RBIC(n)  method,  by  its  very  design,  is  ideally  suited  for  our 
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e ,  : -  -b  .  , c  .  , 
j  J-l  J-l 


cj  :'cj/dj  aj  “  1 
f or  j  =  1 ,  • • • ,  2m 

where  any  elements  not  defined  (ie.  subscripts  <  0) 
are  assumed  to  be  zero. 

When  simplified,  we  find  that  the  RBIC(3)  factorization  requires  ~27m 
arithmetic  operations  per  processor. 


—  J 

Solving  z  =  C  r,  when  matrix  C  is  given  as  LDL  ,  requires  approximately 
2(NZL)  arithmetic  operations  to  solve  Lt  *  r  and  2(NZL)  +  2m  to  solve 
L  z  »  D  *t,  where  NZL  is  the  number  of  non-zero  off-diagonal  elements  in 
L.  After  the  RBIC(O)  factorization,  NZL  will  equal  3m-2,  while  after 
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the  RBIC(3)  factorization  NZL  will  equal  ~6m.  This  means  that  ~14m 
arithmetic  operations  per  processor  are  required  if  the  RBIC(O)  is  used, 
while  if  the  RBIC(3)  is  used,  ~26m  arithmetic  operations  per  processor 
are  needed.  In  either  case,  no  tnterprocessor  data  transfers  are 
required  during  the  factorization  phase  or  while  solving  2  *  C  *r. 

Next,  I  will  look  at  those  preconditioning  strategies  that  require 
a  certain  number  of  fill-ins  be  kept,  or  at  least  calculated,  during  the 
factorization.  These  methods  include  MICD(c),  MIC(s),  and  IC(s)  for 
s>0.  Unfortunately,  including  fill-ins  greatly  complicates  the  process. 
They  increase  the  interdependence  between  processors  both  during  the 
factorization  phase  and  while  solving  z  ■  C  ^r.  For  example, 
processor  i  may  be  forced  to  wait  for  processor  i-1  to  finish 
calculating  before  it  can  proceed  with  its  work.  As  a  result,  only  a 
fraction  of  our  n/2  processors  may  be  able  to  operate  concurrently. 
This  greatly  reduces  the  advantage  of  having  those  n/2  processors.  The 
choice  of  grid  point  ordering  scheme  can  reduce  the  severity  of  this 
problem  somewhat,  but  not  enough  to  make  any  of  these  methods  suitable 
for  our  multiprocessor. 

Finally  we  come  to  the  IC(0)  method.  Unlike  the  other  methods,  its 
suitablity  is  influenced  by  the  grid  point  ordering  scheme  used.  If  we 
are  working  with  a  naturally  ordered  matrix,  the  factorization  process 
is  recursive  in  nature.  We  find  that  processor  1  cannot  start  its  part 
of  the  factorization  process  until  processor  i-1  has  started  calculating 
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°2(i-l)*  These  values  are  needed  by  processor  i  before  it  can  start 
calculating  essence»  only  two  processors  will  be  able  to 
function  concurrently  while  performing  the  factorization.  A  similar 
problem  arises  when  solving  z  =  C  ^r. 


Changing  to 

the 

2-line 

red/black  ordering  does  not  help 

i  the 

situation  that 

much. 

The  only  advantage 

gained 

is  that  now 

Ln/4J 

processors  can 

be 

working 

concurrently 

while 

performing 

the 

factorization. 

The 

remaining 

|n/4j  processors  must 

still  wait 

until 

these  processors  have  calculated  Che  data  they  need.  This  is  still  an 
undesirable  situation. 

The  line  red/black  ordering  produces  a  matrix  much  more  suited  for 
performing  the  IC(O)  factorization  on  our  multiprocessor.  Notice  that 
the  blocks  T2^_i  (i  ■  l»  n/2)  are  not  directly  interrelated.  This 

means  that  processor  i  can  factor  T2i_j  without  any  interprocessor 
communication.  This  requires  ~3m  arithmetic  operations.  For 
processor  i  to  complete  the  factorization,  it  must  now  get  the  values 
°2i+l  ^rom  Processor  i+1*  With  these  m  values,  processor  1  can  finish 
the  factorization  in  ~9m  arithmetic  operations.  An  additional  m 
arithmetic  operations  are  required  to  calculate  ^2±-2  =  D2i-lE2i-2‘ 
These  last  values  will  be  needed  by  processor  i  to  solve  z  *  C  *r.  This 
makes  a  total  of  ~13m  arithmetic  operations  and  m  data  transfers  per 
processor  to  calculate  the  IC(0)  factorization. 

Solving  the  system  z  =  C  ^r  can  also  be  easily  done  in  this  case. 


i 
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During  the  forward  substitution  phase  (Lt  =  r) ,  t ^  can  be  found  in 

~2m  arithmetic  operations  with  no  interprocessor  communications.  The 

elements  of  t^^  can  then  be  found  in  ~6m  arithmetic  operations,  as  long 

as  the  values  t^i+i  are  obtained  from  processor  i+1.  The  backward 

~T  ~-l 

substitution  process  (L  z  =  D  t)  is  very  similar  in  nature.  The 
elements  of  Z£^  are  first  calculated  using  ~3m  arithmetic  operations  and 
no  interprocessor  communications.  The  values  z2i-2  are  tlien  obtained 
from  processor  i-1.  Then  the  values  z2±-\  are  calculated  using  ~7m 
arithmetic  operations.  The  entire  process  requires  a  total* of  -18m 
arithmetic  operations  and  2m  data  transfers. 

The  point  red/black  matrix  is  equally  suited  for  performing  the 
IC(0)  factorization  on  our  multiprocessor.  In  fact,  it  has  one 
advantage  over  the  line  red/black  matrix  in  that  the  factorization  can 
be  done  without  interprocessor  communications.  The  structure  of  the 
point  red/black  matrix  is  such  that  blocks  D1  through  are  not  altered 
during  the  factorization,  ie.  (i  =  1,  • . . ,  k) .  This  allows  us 

to  store  those  values  of  D^_j  and  D^+j  needed  by  processor  i  during  the 
set-up  phase.  Thus,  if  processor  i  has  blocks  ,  E^j,  E.^,  F^_ ^ ,  F^, 
Dt,  H 

performed  without  any  data  being  transferred  between  processors. 
Processor  i  will  perform  the  following  calculations: 


(1-1)2*  H(i+l)l*  anc*  \+i  avai*able  bo  it,  the  factorization  can  be 
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H 


(1-1)2  *  H< 1— 1)2 


H 


Ji-1 


H(  1— 1)2  Ei—  1 


H(i+l)lFi 


(i+Dl 

T 


*  H( 1+1)1 
~T  v-l  T 

8i  ■  Di  BI 


i? 


~T  T 

VI 


1-1 


~-l„T 

Di  Fi-1 


Vk  '  Dl+k  -  dlas(El-l!:I-l>  - 

for  a  total  of  ~13m  arithmetic  operations. 

Solving  the  system  z  =  C  ^r  will  still  require  that  some 
interprocessor  data  transfer  occur.  During  the  forward  substitution 
phase,  processor  i  will  need  from  processor  i-1  the  m/2  elements  of  tj_^ 
corresponding  to  an<^  from  processor  i+1  the  m/2  elements  of 

t^+1  corresponding  to  A  similar  set  of  transfers  will  be 

required  during  backward  substitution,  except  involving  elements  from 
zi+k_l  and  zi+k+l’  7516  entire  process  of  solving  z  =  C  *r  will  require 
~18m  arithmetic  operations  and  2ra  data  transfers  per  processor. 


As  I  have  indicated,  only  a  handful  of  the  chosen  preconditioning 
strategies  can  be  efficiently  implemented  on  our  multiprocessor.  The 
DS,  BDS  and  RBIC(n)  methods  can  be  implemented  regardless  of  the  grid 
point  ordering  scheme  used.  The  IC(0)  method,  on  the  other  hand,  is 
sensitive  to  the  structure  of  the  matrix  A.  Only  when  matrix  A  has  a 
structure  similar  to  that  of  the  point  red/black  matrix  or  line 
red/black  matrix  can  the  IC(0)  factorization  be  done  efficiently. 
Implementation  using  the  point  red/black  matrix  has  an  added  advantage 
in  that  the  factorization  can  be  performed  without  interprocessor 


communication 


52 


4.6.  Phase  III 
4_.6«  1_.  Introduction 

From  the  results  of  Phase  I  and  Phase  II,  the  following 
combinations  of  preconditioning  strategies  and  grid  point  ordering 
schemes  are  chosen  for  further  analysis: 

1)  IC(O)  with  a  point  red/black  matrix 

2)  BDS  with  a  line  red/black  matrix 

3)  RBIC(O)  with  a  2-line  red/black  matrix. 

For  comparison  purposes,  I  also  consider  a  naturally  ordered  matrix  with 
no  preconditioning.  These  combinations  are  compared  relative  to  test 
matrices  of  order  ~1000  arising  from  test  problem  1  (n“32) ,  test 
problem  2,  and  test  problem  3  (see  appendix  D).  The  numerical 
experiments  are  similar  to  those  conducted  during  Phase  I. 

The  size  of  these  problems  made  calculating  all  the  distinct 
eigenvalues  of  the  symmetrically  preconditioned  test  matrices  extremely 
expensive.  I  therefore  limit  myself  to  examining  only  the  extreme 
eigenvalues.  In  each  case,  I  calculate  the  number  of  distinct 
eigenvalues  in  the  interval  [0.0  ,  1.2].  Then,  using  the  estimate  of 
the  spectral  radius  (p)  generated  by  the  Harwell  routine  EA14A,  I 
calculate  the  number  of  distinct  eigenvalues  in  the  upper  part  of  the 
spectrum  defined  by  the  interval  [0.8p  ,  p] .  Of  primary  interest  Is  the 


number  of  eigenvalues  that  migrated  into  the  lower  part  of  the  spectrum 
as  a  result  of  the  preconditioning.  The  greater  the  number  of 
eigenvalues  in  the  interval  [0.0  ,  1.2],  the  more  successful  the 
preconditioning  strategy. 

Finally,  the  effect  of  different  values  of  <(»,  the  cost  In  time 
units  to  transfer  a  piece  of  data  between  neighboring  processors,  on  the 
efficiency  of  each  of  the  preconditioning  strategies  is  examined.  For 
each  problem,  I  calculate  the  total  number  of  time  units  required  by  a 
typical  processor  in  our  system  to  generate  our  answer.  This  was  done 
using  the  following  equation: 

Total  Time  *  Preprocessing  Time 
+  [Number  of  Iterations  x  Time  Units  per  Iteration] 

where , 

Preprocessing  Time  ■  Number  of  Arithmetic  Operations 
+  [4,  x  Number  of  Data  Transfers] , 

Time  Units  per  Iteration  *  Number  of  Arithmetic  Operations/iteration 
+  [4.  x  Number  of  Data  Transfers/iteration]. 

Appendix  E  outlines  the  number  of  arithmetic  operations  and  data 
transfers  required  by  each  processor  to  perform  each  step  of  our 
preconditioned  conjugate  gradient  algorithm. 
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4.6.2.  Software 


Moat  of  the  software  used  during  this  phase  Is  similar  to  that  used 
during  Phase  I.  However,  more  efficient  routines  BDIAG,  ICCGO,  and 
RBICO  are  developed  to  implement  the  BDS,  IC(0),  and  RBIC(O) 
factorizations,  respectively.  Each  of  these  three  routines  is  based  on 
the  following  algorithm: 

Algorithm  4.1 


1) 

2) 

3) 

4) 

5) 


D  ■  diag(A) 


For  i  >  1  to  N  do 


Fo 


r  j  t  Rj  do 


Tij  ■  aij/ai 


dj  ”  dj '  Vu 


where  N  »  order(A),  and 
X 

set  R^  defines  which  columns  in  row  i  are  to  be 

used  in  calculating  the  factorization. 

X 

For  these  three  routines,  the  following  is  how  set  R^  is  defined: 
BDS  -  R^  -  {  j  1  j-i+1  and  a^O  } 

IC(0)  -  R^  -  {  j  I  i<j  and  a^*)  } 

RBIC(O)  -  R^  ■  {  j  I  Kjci+ra  and  a^^o  }. 


4^.  6^.3 .  Results 

The  data  from  the  numerical  experiments  can  be  found  at  the  end  of 
this  section.  Tables  4.10,  12,  and  14  contain  the  timing  and 
convergence  data  pertaining  to  solving  each  of  the  test  problems. 
Tables  4.11,  13,  and  15  contain  the  corresponding  data  on  the  extreme 
eigenvalues  of  the  symmetrically  preconditioned  test  matrices.  Graphs 
4.6  -  4.8  show  the  log^g  C'ie  norm  c^e  residual  as  a  function  of 
the  Iteration  number.  Graphs  4.9  -  4.11  show  what  effect  the 
Interprocessor  communications  cost  (c| i)  can  have  on  the  amount  of  work 
required  by  each  processor  to  calculate  an  acceptable  answer. 

Notice  that  In  these  cases,  the  time  required  to  perform  the 
desired  factorization  Is  trivial  when  compared  to  that  required  to 
actually  solve  the  system.  This  would  Indicate  that  the  savings  incured 
by  using  the  point  red/black  ordering  with  the  IC(0)  method,  as  opposed 
to  using  the  line  red/black  ordering  or  block  cyclic  reduction,  may  not 
be  that  significant  in  the  long  run.  However,  unless  circumstances 
dictate  otherwise,  there  is  no  reason  not  to  utilize  the  point  red/black 
ordering  and  enjoy  what  savings  it  can  provide. 

For  these  test  problems,  the  RBIC(O)  method  prove  at  least  an  equal 
to  the  IC(0)  method  in  efficiency.  Only  in  the  case  of  test  problem  1 
does  the  IC(0)  prove  more  efficient  than  the  RBIC(O)  method.  The  two 
methods  are  extremely  close  in  the  number  of  iterations  required  to 
solve  the  test  problems.  The  RBIC(O),  therefore,  has  a  slight  advantage 
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In  that  each  iteration  requires  fewer  arithmetic  operations  due  to  the 
fewer  non-zero  elements  in  the  upper  triangular  part  of  its 
factorization.  The  BDS  method  is  consistently  a  distant  third,  though 
it  does  represent  a  improvement  over  no  preconditioning. 

Unfortunately,  the  matrices  symmetrically  preconditioned  by  the  BDS 
and  RBIC(O)  methods  consistently  require  more  than  the  750  iterations  I 
have  allotted  for  calculating  their  eigenvalues.  As  a  result,  these 
counts  may  be  incomplete,  but  should  be  reasonably  close.  The  BDS 
method  results  in  substantial  improvement  in  the  eigenvalue  distribution 
as  compared  to  the  matrix  without  preconditioning.  The  RBIC(O)  and 
IC(0)  methods  then  each  register  moderate  subsequent  improvements.  The 
IC(0)  method,  as  would  be  expected,  produces  the  "best"  eigenvalue 
distribution  of  the  three.  It  records  the  smallest  spectral  radius  and 
causes  the  greatest  number  of  eigenvalues  to  migrate  into  the  lower 
interval. 

Looking  at  Graph  4.9  -  4.11,  we  see  that  as  the  cost  to  transfer  a 
piece  of  data  between  neighboring  processors  (<J>)  increases,  the 
advantages  of  using  the  RBIC(O)  factorization  also  increase.  For 
<j»  m  10,  which  may  not  be  unrealistic  for  loosely  connected  processors, 
the  RBIC(O)  saves  between  1500  and  60,000  time  units  over  the  IC(0) 
method.  This  advantage  stems  from  the  fact  that  the  RBIC(0)  method 
requires  no  data  transfers  to  solve  the  system  z  *  C  *r,  while  the  IC(0) 
method  requires  2m  data  transfers. 


Test  Problem  1  (n“32) 


Preconditioning 

Method 


None 

BDS 

RBIC(O) 

1C(0) 


FACTOR 


Number  of 

Elements  in  L 

Time 

0 

0.0 

992 

0.03 

1504 

0.03 

1984 

0.02 

SOLVE 


Number  of 

Iterations 

Time 

92 

2.98 

70 

2.77 

49 

2.03 

46 

1.92 

Table  4.10  -  Timing  and  convergence  data  pertaining  to  solving  the  given 

-6 

test  problem,  such  that  l  lr 1 1 <10 


Preconditioning 

Spectral 

Number  of 

Number  of 

Method 

Radius 

Eigenvalues  ^ 

In  lower  Interval  in 

Eigenvalues 
upper  interval' 

None 

8.664 

10 

9 

BDS 

2.281 

30* 

4 

r 

RBIC(0) 

1.796 

32* 

6 

o 

IC(0) 

1.591 

46 

J 

*  Number  of  Eigenvalues  found 

after  750  iterations. 

Table  4.11  -  Data 

on  the  extreme  Eigenvalues 

l 
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Test  Problem  2 


Preconditioning 

FACTOR 

SOLVE 

Number  of 

Number  of 

Me  thod 

Elements  in  L 

Time 

Iterations 

Time 

None 

0 

0.0 

99* 

2.98 

BDS 

960 

0.03 

104 

3.75 

RBIC(O) 

1456 

0.03 

75 

2.86 

IC(0) 

1921 

0.02 

75 

2.99 

*  I  |rgg ! I  =  0.105E-02 

Table  4.12  -  Timing  and  convergence  data  pertaining  to  solving  the  given 


test  problem,  such 

that  ||r || <10 

• 

Preconditioning 

Spectral 

Number  of 

Number  of 

Method 

Radius 

Eigenvalues  ^ 
lower  interval 

Eigenvalues 

in 

in  upper  interval' 

None 

8.579 

9 

11 

BDS 

2.257 

29* 

5 

RBIC(0) 

1.798 

35* 

5 

IC(0) 

1.684 

49 

3 

*  Number  of  Eigenvalues  found  after  750  iterations. 
Table  4.13  -  Data  on  the  extreme  Eigenvalues. 


Test  Problem  3 


Preconditioning 

FACTOR 

SOLVE 

Number  of 

Number  of 

Method 

Elements  in  L 

Time 

Iterations 

Time 

None 

0 

0.0 

99* 

3.02 

BDS 

960 

0.03 

93 

3.64 

RBIC(O) 

1456 

0.02 

68 

2.60 

IC(0) 

1921 

0.02 

66 

2.66 

*  i|r9g||  =  0. 288E-02 

Table  4.14  -  Timing  and  convergence  data  pertaining  to  solving  the  given 


test  problem,  such 

that  |  |r 1 1 <10  ® 

• 

Preconditioning 

Spectral 

Number  of 

Number  of 

Method 

Radius 

Eigenvalues  ^ 
lower  interval 

Eigenvalues 

in 

in  upper  interval' 

None 

22.977 

4 

1 

BDS 

2.236 

31* 

7 

RBIC(0) 

1.903 

31* 

5 

IC(0) 

1.634 

49 

6 

*  Number  of  Eigenvalues  found  after  750  Iterations. 

Table  4.15  -  Data  on  the  extreme  Eigenvalues. 

1  -  lower  interval  defined  as  [0.0, 1.2] 

2 

-  upper  interval  defined  as  [0.8p,p]  ,  where  p  =  spectral  radius. 


2  3)oz  r>coH(/i(iui  no  ok  m<o>co  nor 


Test  Problem  1  (n“32) 


Graph  4.6  -  The  log  base  10  of  the  norm  of  the  residual  as  a  function 
of  the  iteration  number. 


2xdz  r>cDHtfldi3j  -no  oh  mco>ro  oor 
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Test  Problem  3 


ITERATION 

Graph  4.8  -  The  log  base  10  of  the  norm  of  the  residual  as  a  function 
of  the  Iteration  number. 


WDZXOCDH  2  H  0)HHZC  fTlZH-l 


Test  Problem  1  (na32) 


Graph  4.9  -  Number  of  time  units  required  to  solve  the  given  test 
problem  vs.  the  cost  in  time  units  to  transfer  one  piece  of  data 
between  two  processors. 


WDZX/1CDH  2  H  CO  — IH  2  C  P13  H— t 


Test  Problem  2 


WD2>C0CDIH  Z  H  MHHZC 


Test  Problem  3 


Graph  4.11  -  Number  of  time  units  required  to  solve  the  given  test 
problem  vs.  the  cost  in  time  units  to  transfer  one  piece  of  data 
between  two  processors. 
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5^  Conclusions 

As  we  have  seen,  only  a  limited  number  of  our  original 
preconditioning  strategies  proved  suitable  for  implementation  on  our 
multiprocessor.  The  DS,  BDS,  and  RBIC(O)  methods  proved  acceptable  no 
matter  which  grid  point  ordering  scheme  was  used.  The  IC(0)  method,  on 
the  other  hand,  was  only  feasible  when  teamed  with  point  red/black  or 
line  red/black  matrices.  When  point  red/black  matrices  were  used,  the 
IC(0)  factorization  could  be  performed  without  any  interprocessor 
communications . 

The  numerical  experiments  showed  that,  for  our  given  test  problems 
of  order  ~1000,  the  RBIC(O)  method,  in  most  cases,  was  more  efficient 
than  the  IC(O)  method.  This  was  especially  true  when  viewed  from  the 
standpoint  of  our  hypothetical  multiprocessor.  For  values  of  <(>>1,  the 
RBIC(O)  method  was  substantially  faster. 

While  I  realize  that  these  few  test  results  do  not  prove  that  the 
RBIC(O)  method  is  a  superior  method  in  all  cases,  they  do  indicate  that 
the  RBIC(O)  method  could  be  an  efficient  tool  for  preconditioning  on  a 
multiprocessor.  More  testing  is  needed  to  identify  the  scope  of  its 
potential. 
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Grid  point  ordering  schemes 
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Line  Red/Black  Ordering  for  n*6 
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Figure  A3 


Two  Line  Red/Black  Ordering  for  n-6 
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Figure  A4 


Appendix  B 


Matrix  block  structures 


Appendix  B  shows  the  block  structure  of  the  matrices  associated 
with  the  four  grid  point  ordering  schemes  defined  in  Appendix  A.  I 
assume  discretization  took  place  on  a  nxm  grid,  with  n  being  even. 
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T1  E1 


Natural  Ordering 


Vi  Ti 


Ti+1  Ei+1 
Ei+1  Ti+2  Ei-*-2 


E  .  T  .  E  , 
n-2  n-1  n-1 

E  .  T 
n-1  n 


where  Ei's  are  diagonal  matrices  of  order  m  and 
T^'s  are  tri-diagonal  matrices  of  order  m. 


Figure  B1 
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Line  Red/Black  Ordering 
E1 

e2  e3 


T 

12i-l 


T 

2i+l 


n-3 


n-1 


i  •  •  •  •  • 


“21-1 
E2i  E2i+1 


Lj  ~ 

n-3 

En-2  En-1 


E^  •  •  • 


•  •  •  «  •  • 


E2i-1  E2i 


2i 


J2i+1 


2(1+1) 


•  •  •  •  •  ■ 


En-3  En-2 


"n-1 


n-2 


where  and  E^ 


are  the  same  as  those  used  in  figure  B1 


Figure  B2 


'2i-l 


re  T 


2-line  Red/Black  Ordering 


'1 


I  C* 


I 


'2i-l 


Q2i+1 


I 


T  T 
F2i  C2i+1 


V3 


VI 


p-3 

,T  rT 
P-2  P-1 


2i 


<2i 


C-  .  ,  •  •  • 
2i+l 


Q2(i-fl) 


V3  Fp-2  I 
C 


V 


•*  I 

P-1 


where  k  =  n/2  and  p  =  2(  fk/2*]  ) 


T2i—  I 

E2i-1 

C  =» 

o  e21 

F  = 

0  0 

E2i-1 

T2i 

i 

0  0 

i 

E2.  0 

^  and  E^  are  the  same  as  those  used  in  figure  B1 


If  p  t  k,  ignore  last  row  and  column 
Figure  B3 
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Relating  the  block  structure  of  the  point  red/black  matrix  to  that 
of  the  naturally  ordered  matrix  is  not  as  easy  as  with  the  line 
red/black  matrix  and  the  2-line  red/black  matrix.  The  integrity  of  the 
and  blocks  is  not  maintained  during  the  reordering.  A 

relationship  does  exist  between  the  two,  but  not  at  the  block  level.  We 
find  that  the  point  red/black  matrix  (A')  and  the  naturally  ordered 
matrix  (A)  are  related  such  that 

A'  =  PTAP 

where  P  is  the  permutation  matrix 

P  =  [P^P^-.-^^^Q^...^], 
in  which  for  k  =  1,  2,  •••,  n/2 

P2k-1  =  j(k) ’ 6 j(k)+2,e j(k)+4’ " * " ,e j(k)+m-2^ ’ 

Q2k-1  =  tej(k)+rej(k)+3’"',ej(k)+m-l]’ 

P2k  =  [el(k),el(k)+2’  '”’el(k)+m-2]’ 

Q2k  =  ^el(k)-l,el(k)+l* *  *  * ,el(k)+m-3^ ’ 
with  j(k)  =  2(k-l)m+-l  and  l(k)  =  j(k)+nri-l. 

Figure  B4  outlines  the  block  structure  for  a  point  red/black  matrix. 

The  blocks  here  are  different  from  those  found  in  figures  B1  -  B3. 


Point  Red/Black  Ordering 
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and  being  diagonal  matrices  of  order  m/2 


and  and  being  upper  and  lower  bi-diagonal  matrices  of  order  ra/i 


Figure  B4 
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Appendix  C 

User  input  parameters 

Appendix  C  outlines  the  combinations  of  grid  point  ordering  schemes  and 
preconditioning  strategies  to  be  examined  during  phase  I.  The 
parameters  and  functions  required  by  subroutine  FACTOR  to  generate  each 
of  the  combinations  are  defined.  The  abbreviations  used  to  describe  the 
various  preconditioning  strategies  are  defined  in  section  4.3. 


Natural  Ordering  (NTYPE  =  0) 


—  ,  ■■■■■-—  —  - 

Preconditioning 

Strategy 

OPTION 

vector 

Functions 

1 

2 

3 

B 

5 

6 

c 

EUSE 

FILL 

None 

- 

- 

- 

- 

- 

1 

- 

- 

DS 

0 

0 

0 

l 

0 

0 

- 

EUSE1 

FILL1 

BDS 

0 

o 

u 

0 

1 

0 

0 

- 

EUSE2 

FILL1 

IC(0) 

0 

0 

0 

0 

0 

0 

- 

- 

FILL1 

IC(1) 

0 

0 

0 

0 

0 

0 

- 

- 

FILL2 

IC(3) 

0 

0 

0 

0 

0 

0 

- 

- 

FILL3 

MIC(O) 

0 

0 

0 

0 

D 

0 

- 

- 

FILL1 

HARWELL ( 10" 2  ) 

1 

1 

0 

0 

0 

io"2 

- 

- 

HARWELL (0) 

1 

1 

0 

0 

■ 

0 

0.0 

- 

- 

MICD(10~2) 

0 

1 

0 

0 

■ 

0 

io-2 

- 

- 

MICD(O) 

0 

1 

0 

0 

1 

0 

0.0 

- 

- 

Line  Red/Black  Ordering  (NTYPE  =  1) 


(Precondt it toning 

Strategy 

BDS 

IC(0) 

HARWELL(10~2) 
HARWELL (0) 
MICD(10"2) 
MICD(O) 
RBIC(O) 
MIC(O) 


OPTION 

vector 

1 

2  3 

4  5 

6 

0 

0  0 

1  0 

0 

0 

0  0 

0  0 

- 

1 

1  0 

0  1 

0 

1 

1  0 

0  1 

0 

0 

1  0 

0  1 

0 

0 

1  0 

0  1 

0 

0 

0  0 

1  0 

0 

0 

0  0 

0  1 

0 

Functions 

EUSE 

FILL 

EUSE2 

FILL1 

FILL1 

_ 

- 

- 

- 

- 

- 

- 

EUSE3 

FILL1 

- 

FILL1 

Table  C2 
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Point  Red/Black.  Ordering  (NTYPE  =  2) 


Preconditioning 

Strategy 

OPTION 

vector 

Functions 

i 

2 

3 

fl 

5 

6 

c 

EUSE 

m 

DS 

0 

0 

0 

l 

0 

0 

- 

EUSE1 

FILLl 

IC(0) 

0 

0 

0 

0 

0 

0 

- 

- 

FILL1 

HARWELLC10-2) 

1 

1 

0 

0 

1 

0 

10-2 

- 

- 

HARWELL (0) 

1 

1 

0 

0 

1 

0 

0.0 

- 

- 

MICD(10~2) 

0 

1 

0 

0 

1 

0 

10-2 

- 

MICD(O) 

0 

1 

0 

0 

1 

0 

0.0 

- 

RBIC(O) 

0 

0 

0 

1 

0 

0 

- 

EUSE4 

FILLl 

MIC(O) 

0 

0 

0 

0 

1 

0 

- 

- 

FILLl 

Table  C3 
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2  Line  Red/Black  Ordering  (NTYPE  =»  3) 


Preconditioning 

OPTION 

vector 

Functions 

Strategy 

1 

2 

3 

n 

5 

6 

c 

EUSE 

FILL 

BDS 

0 

0 

0 

i 

0 

0 

- 

EUSE2 

FILL1 

IC(0) 

0 

0 

0 

0 

0 

0 

- 

- 

FILL1 

MIC(O) 

0 

0 

0 

0 

l 

0 

- 

- 

FILL1 

HARWELL(10-2) 

1 

1 

0 

0 

1 

0 

-2 

10  z 

- 

- 

HARWELL (0) 

1 

1 

0 

0 

1 

0 

0.0 

- 

- 

MICD(10"2) 

0 

1 

0 

0 

1 

0 

10-2 

- 

- 

MICD(O) 

0 

1 

0 

0 

1 

0 

0.0 

- 

- 

RBIC(O) 

0 

0 

0 

1 

0 

0 

EUSE5 

FILL1 

RBIC(3) 

0 

0 

0 

lL 

0 

0 

EUSE  5 

FILL3 

Table  C4 
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Definitions  for  parameters  used  in  Tables  Cl  -  C4. 


OPTION(l)  -  0 
1 

0PTI0N(2)  *  0 
1 

0PTI0N(3)  -  0 
1 

OPTION (4)  -  0 
1 


0PTI0N(5)  =*  0 
1 

0PTI0N(6)  »  0 
1 


Natural  order  factorization 

Minimum  degree  factorization 

Function  FILL  used  to  control  fill-ins 

Drop  tolerance  C  used  to  control  fill-ins 

No  diagonal  scaling  prior  to  factorization 

Diagonal  elements  scaled  by  1+ABS(C)/N 

prior  to  factorization 

All  matrix  elements  used  in  calculating 

the  incomplete  factorization 

Function  EUSE  determines  which  matrix 

elements  to  use  in  calculating  the 

incomplete  factorization 

No  diagonal  modification 

Diagonal  modification  performed 

Calculate  the  desired  preconditioning  matrix 

Bypass  calculating  the  preconditioining  matrix 


C  -  Drop  tolerance  used  when  0PTI0N(2)  is  in  affect 

EUSE  -  Function  used  to  determine  which  elements  of  matrix  A 
are  to  be  used  in  calculating  the  incomplete 
factorization 


FILL 


Function  used  to  determine  which  fill-ins  to  keep 
during  the  incomplete  factorization 
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Appendix  D 

Definition  of  test  problems. 
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Test  Problem  1 


Laplace  Equation 


a2u 


a2u 


ax 


ay 


over  the  unit  square  with  Dirichlet  boundary  conditions 


(0,1) 

U=1 

(1,1) 

Ax  =  l/(n+l) 

U=1 

u=l 

Ay  =  1/ (n+1 ) 

(0,0) 

u=  1 

(1,0) 

Phase  I 

n=8  m=8 

Matrix  A  of  order  64 


Phase  III 

n=32  ®=*32 

Matrix  A  of  order  1024 


Matrix  A  will  be  a  positive  definite  M-matrix, 
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Test  Problem  2 


(0,0)  u=l  (1,0) 

Ax  =  1/31  Ay  =  1/31 
n=32  m=3 1 

Matrix  A  is  of  order  992 
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f(x,y)  - 


Test  Problem  3 

d  ,,  2  ,  2  ...  a  x  5  ,  xy5  .  ,  c/ 

-  ^((x  +  y  +  1>a?0  ■  a7(e  V°  +  u  '  f(x>y> 

ex  y(l  -  (4x*y^  +  4x2y^  +  2x2y2  +  6x2y  +  2y^  +  2y)  -  (x^  +  x'^)exy) 
over  the  unit  square  with  boundary  conditions 


(0,1) 

(2) 


(3)  (1,1) 


(4) 


(0,0)  (1)  (1,0) 

(1)  u  -  1 

(2)  IS-0 

(3)  u  +|J  =  ex2(l  +  x2) 

(4)  ^  "  W 

with  Ax  =  1/31  £y  *  1/31 
n*32  np»31 

Matrix  A  is  of  order  992 


Matrix  A  will  be  positive  definite,  but  not  an  M-matrix 
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Appendix  E 

Cost  of  Conjugate  Gradient  Algorithm 

Outlines  the  number  of  arithmetic  operations  and  data  transfers  required 
by  each  step  of  our  preconditioned  conjugate  gradient  algorithm  if 
implemented  on  our  multiprocessor.  I  assume  that  the  system  of 
equations  being  solved  Is  of  order  nm,  where  n  is  even,  and  that  our 
multiprocessor  consists  of  n/2  processors  as  arranged  in  figure  1.1. 


Preprocessing 

factorization 

CG  Algorithm 

Arithmetic 

Operations 

* 

Interprocessor 
Data  Transfers 

* 

xQ  -  C^b 

* 

* 

r  *  AXq  -  b 

2  2m- 8 

2m 

CrV/2 

.1 

<¥2m 

m/2 

g  »  C  *r 

* 

* 

e  »  -g 

0 

0 

T 

60  -  '8 

4-|m 

m/2 

Each  CG  Iteration 

f  *  Ae 

20m— 8 

2m 

X  =  6Q/eTf 

4-|m 

m/2 

x  =  x  +  \e 

4m 

0 

r  =  r  +  \f 

4m 

0 

(rTr)1/2 

,1 

4-^1 

m/2 

g  *  C-1r 

* 

* 

5j  -  rTg 

4-|m 

m/2 

P  "  6l/60 

1 

0 

60“61 

0 

0 

e  -  -g  +  pe 

4m 

0 

Table  El  -  Cost  breakdown  of  each  step  of  a  preconditioned  conjugate 
gradient  algorithm  as  Implemented  on  our  multiprocessor. 


93 


Preconditioning 

Factorization 

g  -  c" 

lr 

Method 

Arith.  Ops. 

Comm. 

Arith.  Ops. 

Comm 

BDS 

6m 

0 

10m 

0 

IC(0) 

13m 

0 

18m 

2m 

RBIC(O) 

9m 

0 

14m 

0 

Table  E2  -  Cost  breakdown  of  the  preconditioning  method  dependent 
from  Table  El  for  the  preconditioning  methods  considered 
Phase  III. 


Preconditioning 

Preprocessing 

1  CG  iteration 

Method 

Arith.  Ops. 

Comm. 

Arith.  Ops. 

Comm 

None 

31m 

3m 

4*|m 

4” 

BDS 

57m 

3m 

55^m 

4° 

IC(0) 

80m 

7m 

63^m 

4® 

RBIC(O) 

68m 

3m 

59|m 

4“ 

items 

during 


Table  E3  -  Outlines  the  costs  associated  with  the  preprocessing  stage 
and  each  CG  iteration  for  the  preconditioning  strategies  considered 
during  Phase  III  if  implemented  on  our  multiprocessor. 
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Appendix  F 


Program  Listings 
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Hierarchy  Phase  I  Software 


PR0G1 
.  GENA 
.  .  NORDER 

.  .  ISTORE 

.  .  MA31E* 

.  FACTOR 
.  .  EUSE 

.  .  MA31C 

.  .  .  MA31D* 

.  .  .  FILL 

.  SOLVE 
.  .  MA31F 

.  .  .  MA31G* 

.  .  .  MA3LH 


PR0G2 
.  GENA 
.  .  NORDER 

.  .  ISTORE 

.  .  MA3IE* 

.  FACTOR 
.  .  EUSE 

.  .  MA3IC 

.  .  .  MA31D 

.  .  .  FILL 

.  GETEIG 
.  .  EA14AD* 

.  .  MA3IG2 

.  .  MA31H 

.  .  MA31G1 

* 

.  .  DSCRPT 

.  .  DSCRP2* 
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Heirarchy  Phase  III  Software 


PR0G1A/B/C/D 
.  GENA 
.  .  NORDER 

.  .  ISTORE 

.  .  MA31E* 

.  ICCGO/BDIAG/RBICO 
.  SOLVE 
.  .  MA31F 

.  .  MA31G* 

.  .  MA31H 


PR0G2A/B/C/D 
.  GENA 
.  .  NORDER 

.  .  ISTORE 

.  .  MA31E* 

.  ICCGO/BDIAG/RBICO 
.  GETEG2 
.  .  EAI4AD* 

.  .  MA3LG2 

.  .  MA31H 

.  .  MA31G1 


Program  listings  for  these  routines  are  not  Included. 

They  maybe  found  in  the  following  locations: 

MA31D,  MA31E  and  MA31G  -  Cyber  Harwell  library  as  part  of  the 

MA31A  package. 

EA14AD  -  Cyber  Harwell  library 
DSCRPT  and  DSCRP2  -  Cyber  MSL  library 
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PROGRAM  PROG 1( INPUT, OUTPUT, MESS, TAPE4-INPUT.TAPE5-MESS, 
*TAPE6-OUTPUT) 

C 

C  SOLVE  THE  LINEAR  SYSTEM  OF  EQUATIONS  ARISING  FROM 
5  C  THE  DISCRETIZATION  FOR  OUR  MODEL  PROBLEM  USING  A 
C  PRECONDITIONED  CONJUGATE  GRADIENT  ALGORITHM. 

C 

C  SUBROUTINE  GENA 
c  - 

10  C  PERFORMS  THE  DISCRETIZATION  OF  THE  CURRENT  PROBLEM. 

C  THE  USER  SPECIFIED  INPUT  PARAMETER  NTYPE  DETERMINES 
C  THE  TYPE  OF  GRID  POINT  ORDERING  SCHEME  TO  BE  USED: 

C  NTYPE  -  0  -  NATURAL 

C  1  -  LINE  RED/BLACK 

15  C  2  -  POINT  RED/BLACK 

C  3-2  LINE  RED/BLACK 

C 

C  SUBROUTINE  FACTOR 
c  - 

20  C  CALCULATES  THE  PRECONDITIONING  MATRIX  BY  INCOMPLETE 
C  FACTORIZATION.  THE  TYPE  OF  INCOMPLETE  FACTORIZATION 


C 

DONE  IS  DETERMINED 

BY 

THE  USER  SPECIFIED  OPTION  VECTOR: 

C 

OPTION(l)  * 

0 

- 

NATURAL  ORDER  FACTORIZATION 

C 

1 

- 

MINIMUM  DEGREE  FACTORIZATION 

25 

C 

0PTI0N(2)  =» 

0 

- 

FUNCTION  FILL  USED  TO  CONTROL  FILL-INS 

C 

1 

- 

DROP  TOLERANCE  C  USED  TO  CONTROL 

C 

FILL-INS 

C 

0PTI0N(3)  - 

0 

- 

NO  DIAGONAL  SCALING  PRIOR  TO 

c 

FACTORIZTION 

30 

c 

1 

- 

DIAGONAL  ELEMENTS  SCALED  BY  1+ABS(C)/N 

c 

PRIOR  TO  FACTORIZATION 

c 

0PTI0N(4)  - 

0 

- 

ALL  MATRIX  ELEMENTS  USED  IN  CALCULATING 

c 

THE  INCOMPLETE  FACTORIZATION 

c 

1 

- 

FUNCTION  EUSE  DETERMINES  WHICH  MATRIX 

35 

c 

ELEMENTS  TO  USE  IN  CALCULATING 

c 

THE  INCOMPLETE  FACTORIZATION 

c 

0PTI0N(5)  * 

0 

- 

NO  DIAGONAL  MODIFICATION 

c 

1 

- 

DIAGONAL  MODIFICATION  PERFORMED 

c 

0PTI0N(6)  - 

0 

- 

CALCULATE  THE  DESIRED  PRECONDITIONING 

40 

c 

MATRIX 

c 

1 

- 

BYPASS  CALCULATING  PRECONDITIONING  MATRIX 

L. 

c 

c 

SUBROUTINE  SOLVE 

45  C  SOLVES  THE  LINEAR  SYSTEM  USING  THE  HARWELL  MA31F 
C  PRECONDITIONED  CONJUGATE  GRADIENT  ALGORITHM. 

C  MITS  -  MAXIMUN  NUMBER  OF  ITERATIONS  ATTEMPTED 

C  EPS  -  DESIRED  ACCRACY  OF  SOLUTION  IN  TERM  OF 


L 
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C  THE  NORM  OF  THE  RESIDUAL 

50  C 

C  FOR  MORE  DETAILS  SEE  THE  INDIVIDUAL  SUBROUTINES. 

C 

REAL  A(650),B(64),W(64,3),W1(64,3) 

INTEGER  INI(200),INJ(650),IK(64,4),IW(64,4),OPTION(6) 

55  C 

COMMON/MA3 1 I/DD, LP, MP 
COMMON /MA3 1 J /LROW , LCOL, NCP , ND , IPD 
COMMON /MA3 1K/NURL, NUCL.NUAL 
COMMON /MCOMM3 /OPTION 
60  C0MM0N/MA3 1L/EPST0L.U 

C0MM0N/MA3 1M/NI , NJ, NVERSN, NTYPE 
C0MM0N/MA3 1N/MITS , EPS1 
C 

EXTERNAL  FILL.EUSE 

65  C 

DATA  DD,LP,MP/L.0,6,5/ 

DATA  EPSTOL,U/2.0E-6, 1.0E2/ 

DATA  NI.NJ/8,8/ 

DATA  IAI,IAJ,NN/200,650,64/ 

70  DATA  MITS,EPSl/50, 1 .OE-6/ 

C 

ND-NN 

C 

READ(4,*)  NTYPE, NVERSN 
75  READ(4 ,*)  (OPTION(I), 1-1,6) 

READ(4,*)  C 
C 

CALL  GENA(NN,NZ,A,INI,INJ,IAI,IAJ,W,B,IK,IW) 

C 

80  IF  (0PTI0N(6).EQ.l)  GO  TO  5 

C 

C  PERFORM  THE  DESIRED  FACTORIZATION 
C 

CALL  FACTOR(NN,NZ, A, INI, INJ,IAI,IAJ, IK, IW,W,C, FILL.EUSE) 
85  GO  TO  15 

5  CONTINUE 

C 

C  NO  PRECONDITIONING  REQUESTED 
C  GENERATE  IDENTITY  MATRIX 
90  C 

LROW-O 

DO  10  I-l.NN 

IK(I,l)-0 

IK(I,2)-I 

95  W(I,2)-l.O 

10  CONTINUE 
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1 5  CONTINUE 
C 

C  PERFORM  THE  PRECONDITIONED  CONJUGATE  GRADIENT  ITERATION 
100  C 

CALL  S0LVE(NN,NZ,A,INI,INJ,IAI,IAJ,W,IK,B,W1) 

C 

END 


SUBROUTINE  GENA(NN,NZ,A,INI,INJ,IAI,IAJ,D,B>IK,IW) 

C 

C* ***************************************************** 

C 

5  C  GENA1 

C  - 

C 

C  PERFORMS  THE  DISCRETIZATION  OF  THE  LAPLACE  EQUATION 
C  OVER  THE  UNIT  SQUARE  WITH  DIRICHLET  BOUNDARY  CONDITIONS 
10  C  USING  A  NI  X  NJ  GRID. 

C  IDENTIFIED  AS  PROBLEM  1  IN  TEXT. 

C 

C* ********  ********* ************************************ 

c 

15  C  INPUT  PARAMETERS 

c  - 

c 

C  NN  -  ORDER  OF  MATRIX  A 
C  IAI  -  SIZE  OF  ARRAY  INI 
20  C  IAJ  -  SIZE  OF  ARRAYS  INJ  AND  A 

C 

C  OUTPUT  PARAMETERS 

C  - 

C 

25  C  NZ  -  NUMBER  OF  NON-ZERO  ELEMENTS  IN  THE  UPPER 
C  TRIANGULAR  PORTION  OF  MATRIX  A 

C  A  -  ARRAY  CONTAINING  THE  NON-ZERO  ELEMENTS  IN 

C  THE  UPPER  TRIANGULAR  PORTION  OF  MATRIX  A 

C  IN  ROW  ORDER 

30  C  INI/INJ  -  ARRAYS  CONTAINING  THE  ROW/COLUMN 
C  INDICES  OF  THE  CORRESPONDING  ENTRY 

C  IN  ARRAY  A  (IE.  INI(I)  AND  INJ(I) 

C  CONTAIN  THE  ROW  AND  COLUMN  INDEX 

C  FOR  THE  ENTRY  IN  A(I)  ) 

35  C  D  -  ARRAY  CONTAINING  THE  DIAGONAL  ELEMENTS  OF 

C  MATRIX  A 

C  B  -  CONTAINS  THE  RESULTING  RIGHTHAND  SIDE 
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C  IK(I,1)  -  NUMBER  OF  ELEMENTS  IN  ARRAY  A  BELONGING 
C  TO  ROW  I 

40  C  IK(J, 2)  -  NUMBER  OF  ELEMENTS  IN  ARRAY  A  BELONGING 
C  TO  COLUMN  J 

C  IW(I)  -  POINTS  TO  THE  FIRST  ELEMENT  OF  ROW  I  IN 
C  ARRAY  A 

C 

45  C  COMMON  BLOCK  PARAMETERS 

c  - 

c 

C  LROW,LCOL,NCP, IPD, DD  -  NOT  USED 
C  ND  -  ORDER  OF  MATRIX  A 
50  C  LP  -  OUTPUT  FILE  UNIT  NUMBER 
C  MP  -  MESSAGE  FILE  UNIT  NUMBER 
C  NI  -  NUMBER  OF  GRID  POINTS  IN  THE  X  DIRECTION 
C  NJ  -  NUMBER  OF  GRID  POINTS  IN  THE  Y  DIRECTION 
C  NVERSN  -»  PROBLEM  IDENTIFIER 
55  C  NTYPE  -  DETERMINES  GRID  POINT  ORDERING  TO  BE 
C  USED.  SEE  NORDER  FOR  DETAILS 

C 

REAL  A(IAJ),B(NN),D(NN),ATYPE(4) 

INTEGER  INI ( IAI ) , IN J ( IAJ ) , IK(NN , 2 ) , IW(NN ) 

60  C 

COMMON /MA3 1 J/LROW ,LCOL, NCP , ND , IPD 
COMMON /MA3 1 I /DD , LP , MP 
COMMON/MA3 1M/NI , NJ, NVERSN , NTYPE 
C 

65  DATA  ATYPE/7HNATURAL, 7HLINE  RB,8HPOINT  RB.8H2LINE  RB/ 

WRITE(MP,2) 

2  FORMATCUH  GENA  START) 

C 

C  INITIALIZE  DATA 
70  C 

DO  5  I-l.ND 
IK(I,l)-0 
IK(I,2)=0 
IW(I)«0 

75  5  CONTINUE 

C 

CALL  TIME(AT) 

CALL  DATE(AD) 

CALL  SECOND(TIMl) 

80  C 

NNAT-0 

NZ*0 

C 

C  PROCESS  GRID  POINTS  IN  NATURAL  ORDER 
85  C  PERFORMING  THE  DISCRETIZATION 


10! 


C 

DO  100  J-l.NJ 
DO  90  1=1,  NI 
NNAT=NNAT+ 1 

90  N=NORDER(NTYPE,I,J,NNAT) 

D(N)=4.0 

B(N)-0.0 

IF  ((I.EQ.l).OR.(I.EQ.NI))  B(N)=B(N)+1.0 
IF  ((J.EQ. 1 ) .OR. (J.EQ.NJ))  B(N)=B(N)+1 .0 

95  C 

IF  (I.EQ.NI)  GO  TO  50 

NZ-NZ+1 

A(NZ)=- 1.0 

NT=NORDER(NTYPE , 1+ 1 , J , NNAT+ 1 ) 

100  CALL  ISTORE(N,NT,INI,INj,IAI,IK,ND,NZ) 

C 

50  CONTINUE 

IF  (J.EQ.NJ)  GO  TO  90 
NZ=NZ+1 

105  A(NZ)=-1 .0 

NT»NORDER(NTYPE , I , J+ 1 , NNAT+N I ) 

CALL  ISTORE ( N , NT , IN I , IN J , IAI , IK, ND , NZ ) 

C 

90  CONTINUE 
110  100  CONTINUE 

C 

C  INITIALIZE  IW(I)  TO  POINT  JUST  BEYOND  WHERE  THE 
C  LAST  COMPONENT  OF  ROW  I  WILL  BE  STORED 
C 

115  KI=1 

DO  200  1=1, ND 
KI=KI+IK(I,1) 

200  IW(I)=KI 
C 

120  C  REORDER  BY  ROWS  USING  IN-PLACE  SORT  ALGORITHM 
C 

CALL  MA31E(INI,INJ,NZ,IW,ND,A) 

C 

C  REINITIALIZE  IW(I)  TO  POINT  TO  THE  BEGINNING  OF  ROW  I 
125  C 

KK=1 

DO  210  IR= 1 , ND 
IW(IR)=KK 

210  KK=KK+I K(  I R , 1 ) 

130  DO  220  1=1, NZ 

220  INI(t)=IABS(INI(I) ) 

C 


CALL  SEC0ND(TIM2) 


102 


135 


140 


145 


150 


155 


160 


165 


170 


TIMD-TIM2-TIM1 

C 

C  OUTPUT  STATISTICS 
C 

WRITE(LP, 250)  TIMD 

250  FORMAT ( I 3H  GENA  TIME  -  .F6.3.4H  SEC) 
WRITE(LP, 260)  NVERSN 
260  FORMAT (11H  VERSION  -  ,12) 

WRITE(LP,265)  ATYPE(NTYPE+1) 

265  FORMAT (14H  MATRIX  A  HAS  ,A10,9H  ORDERING) 
WRITE(LP,270)  AD, AT 

270  FORMAT (18H  DATE  GENERATED  *  ,A10,A10) 
WRITE(LP, 280)  ND.NZ 
280  FORMAT ( 6H  ND  *  ,I4,6H  NZ  -  ,14) 

WRITE(MP, 290) 

290  FORMAT ( 9H  GENA  END) 

C 

RETURN 

END 

SUBROUTINE  ISTORE(N ,N J,  INI , INJ, IAI , IK, NP , NZ) 
C 

INTEGER  INI(IAI) , INJ(IAI) , IK(NP, 2) 

C 

C  SUBROUTINE  USED  TO  UPDATE  ROW  AND  COLUMN  COUNTS 
C 

IF  (N.GT.NJ)  GO  TO  10 
INI(NZ)=N 
IK(N, 1 )=IK(N, 1 )+l 
INJ(NZ)-NJ 
IK(NJ,2)-IK(NJ,2)+1 
GO  TO  20 
10  INI(NZ)-NJ 

IK(NJ,l)-IK(NJ,l)+l 
INJ(NZ)*N 
IK(N,2)-IK(N,2)+1 
20  CONTINUE 
RETURN 
END 


FUNCTION  NORDER(NTYPE , I , J , N ) 

C 

C  SUBROUTINE  TO  PERMUTE  AN  ELEMENT  FROM  NATURAL  ORDERING  TO 
C  ONE  OF  THE  OTHER  ORDERING  SCHEMES 
5  C 

C  NTYPE  -  0  NATURAL  ORDERING 
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10 


15 


20 


25 


30 


35 


40 


45 


C  -  1  LINE  RED /BLACK  ORDERING 

C  -  2  POINT  RED/BLACK  ORDERING 

C  *32  LINE  RED /BLACK  ORDERING 

C 

INTEGER  PTRB ,0FFST(4 ) 

COMMON /MA3 1M/NI , N J , NVERSN , NTYP 
DATA  OFFST/3 2, 512, 496,496/ 

DATA  NATURL , LINRB , PTRB ,L2RB/0, 1 ,2,3/ 

C 

NTEMP“N 

C 

IF  (NTYPE.EQ. NATURL)  GO  TO  100 
C 

IMOD»MOD( 1+1,2) 

JM0D-M0D(J+1,2) 

C 

C  DETERMINE  IF  LINE  RED-BLACK  ORDERING  REQUESTED 
C 

IF  (NTYPE.NE. LINRB)  GO  TO  20 
NTEMP«J+( (I-I )/2) *NJ 
IF  (IMOD.EQ.O)  GO  TO  15 
NTEMP-NTEMP+OFFSTCNVERSN+I ) 

15  CONTINUE 
GO  TO  100 
20  CONTINUE 
c 

C  DETERMINE  IF  POINT  RED-BLACK  ORDERING  REQUESTED 
C 

IF  (NTYPE.NE. PTRB)  GO  TO  30 
NTEMP-(N+l)/2 
IF  (IMOD.EQ.JMOD)  GO  TO  25 
NTEMP*NTEMP+OFFST (NVERSN+I ) 

25  CONTINUE 
GO  TO  100 
30  CONTINUE 

C  DETERMINE  IF  TWO  LINE  RED-BLACK  ORDERING  REQUESTED 
C 

NTEMP=J+IMOD*NJ+((I-l)/4)*NJ*2 
NIMOD*MOD( (1+1 )/2 ,2 ) 

IF  (NIMOD.EQ. 1 )  GO  TO  100 
NTEMP-NTEMP40FFST (NVERSN+1 ) 


C 

50  100  CONTINUE 

NORDER-NTEMP 

RETURN 

END 
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SUBROUTINE  FACTOR(NN, NZA, A , INI , INJ , IAI , IAJ , IK , IW , 

*W , C , FILL, EUSE) 

C 

C  SUBROUTINE  TO  CALCULATE  THE  PRECONDITIONING  MATRIX 
5  C  USING  THE  MODIFIED  HARWELL  ROUTINE  MA31C.  THIS 
C  SUBROUTINE  PERFORMS  THE  SAME  FUNCTIONS  AS  THE 
C  HARWELL  ROUTINE  MA31A.  SEE  DESCRIPTION  OF  THE  HARWELL 
C  MA31  PACKAGE  FOR  MORE  DETAILS. 

10  REAL  A(IAJ) , W(NN,3 ) 

INTEGER  IK(NN, 4) ,IW(NN,4) ,INI(IAI) ,INJ(IAJ) ,0PTI0N(6) 
LOGICAL  FILL, EUSE 
C 

C0MM0N/MA3 1 1/DD, LP,  MP 

1 5  COMMON /MA3 1 J /LROW , LCOL , NCP , ND , IPD 

COMMON /MA3 1K/NURL , NUCL, NUAL 
COMMON /MC0MM3 /OPTION 
COMMON /MA3 1M/N I , N J , NVERSN , NTYPE 
C 

20  CALL  SECOND(TIMl) 

C 

NZ-NZA 

NZP1-NZA+1 

IAJ1-IAJ-NZA 

25  C 

C  SAVE  ROW  INDEX  FILE  IK(K,1) 

C 

DO  5  K-1,NN 
5  IK(K,4)«IK(K,1) 

30  C 

IF  (OPTION(4).EQ.O)  GO  TO  18 
C 

C  ELIMINATE  THOSE  ELEMENTS  NOT  TO  BE  USED  IN  THE 
C  INCOMPLETE  FACTORIZATION  AS  DETERMINED  BY  THE 
35  C  FUNCTION  EUSE. 

C 

NZ1-NZ+1 

KK-NZ 

DO  12  K-l.NZ 
40  I-INI(K) 

J*INJ(K) 

IF  (EUSE(I,J))  GO  TO  10 
IK(I,1)-IK(I,1)-1 
IK(J,2)»IK(J,2)-1 
45  GO  TO  12 

10  CONTINUE 
KK-KK+1 
A(KK)«A(K) 


o  o  nnnn  nnrso  n  n  no 


INJ(KK)-J 
12  CONTINUE 

REBUILD  THE  START  OF  ROW  I  FILE  IW(I,1) 

KI-NZ1 

DO  14  R-l.ND 
IW(R,1)«RI 
RI-RI+IR(R, 1 ) 

14  CONTINUE 

NZ-RR-NZA 
CALL  SECOND (TIM2) 

IF  (NZ.NE.O)  GO  TO  18 

SPECIAL  CASE  OF  DIAGONAL  SCALING 

DO  15  I-l.ND 
W(I,2)-W(I,1) 

IR(I,2)-I 
L5  CONTINUE 
LROW-O 
LCOL-O 
IFLAG-0 
GO  TO  45 

CONSTRUCT  COLUMN  FILE  IW(I,2)  TO  POINT  JUST  BEYOND  WHERE  THE 
LAST  COMPONENT  OF  COLUMN  I  WILL  BE  STORED 

18  KJ-IAI-NZ+1 
DO  20  I-l.ND 
KJ-KJ+IK(I,2) 

IW(I,2)-KJ 
10  CONTINUE 

CONSTRUCT  COLUMN  FILE  IN  HIGH  ORDER  PART  OF  INI 

DO  30  IR-l.ND 
KPP-IW(IR.l) 

KLL-KPP+IK(IR,1)-1 
IF  (KPP.GT.KLL)  GO  TO  30 
DO  25  K-KPP.KLL 
J-INJ(K) 

KR-IW(J,2)-1 
IW(J,2)-KR 
INI(KR)-IR 
25  CONTINUE 
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30  CONTINUE 
C 

C  TRANSFER  INPUT  MATRIX  TO  TAIL  END  OF  ARRAY  A 
100  C  AND  MODIFY  INJ  TO  REFLECT  THE  MOVE 
C 

NUAL-IAJ+1 
DO  40  II-l.ND 
I*ND-II+1 

105  W(I,2)-W(I,1) 

KP-IW(I.l) 

KL-KP+IK(I,1)-1 
IF  (KP.GT.KL)  GO  TO  38 
DO  35  KK-KP,KL 
110  K-KP4KL-KK 

NUAL-NUAL-1 
A(NUAL)-A(K) 

INJ(NUAL)-INJ(K) 

35  CONTINUE 

115  38  IW( I , 1 ) -NUAL-NZA 

40  CONTINUE 
C 

C  INITIALIZE  COMMON  MA31J  AND  MA31K  VARIABLES 
C 

120  LCOL-NZ 

LROW-NZ 
NURL-0 
NUCL-IW(1,2) 

NUAL-NUAL-NZA 
125  IFLAG-0 

NCP-0 
C 

CALL  SECOND(TIM2) 

C 

130  C  PERFORM  THE  FACTORIZATION 
C 

CALL  MA31C(ND,NZ,W(1,2),A(NZP1),INI,INJ(NZP1), 
1 I AI , IAJ l,IK,IW,IW(l,3),W(lf3), IFLAG , C ) 

C 

135  45  CALL  SEC0ND(TIM3) 

C 

C  RESTORE  INI 
C 

KP-1 

DO  56  I-l.ND 
KL*KP+IK(I ,4)-l 
IF  (KP.GT.KL)  GO  TO  56 
DO  55  K-KP.KL 


140 
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55  INI(K)-I 
145  56  KP-KL+1 

C 

C  OUTPUT  STATISTICS  ON  THE  FACTORIZATION 
C 

WRITE(LP, 58) 

150  58  FORMAT(2  5H0RESULTS  OF  FACTORIZATION) 

WRITE(LP,60)  (OPTION(I),I-l,6),C 
60  FORMAT( 1H0 , 9H0PTI0N  -  6I1,2X,4HC  -  ,F9.5) 

WRITE (LP, 65)  IFLAG 
65  FORMAT (9H  IFLAG  -  ,13) 

155  C 

C  TPD  -  TIME  REQUIRED  TO  PREPARE  DATA  ARRAYS 
C  PRIOR  TO  CALLING  MA31C. 

C  TD  -  TIME  REQUIRED  BY  MA31C  TO  PERFORM  THE 
C  FACTORIZATION. 

160  C  TDT  -  TOTAL  TIME  REQUIRED  BY  SUBROUTINE  FACTOR. 

C 

TDT-TIM3-TIM1 

TPD-TIM2-TIM1 

TD-TIM3-TIM2 

165  C 

WRITE(LP, 70)  TDT, TPD, TD 

70  F0RMAT(7H  TDT  -  ,F6.3,7H  TPD  -  ,F6.3,6H  TD  -  ,F6.3) 
C 

WRITE(LP, 85)  NTYPE.NVERN 

170  85  FORMAT ( 9H  NTYPE  -  , 12 ,2X, 10HVERSION  -  ,12) 

WRITE(LP, 90)  LROW 

90  FORMAT (2 1H0NUM  ELEMENTS  IN  L  -  ,14) 

WRITE(LP.IOO)  ND.NZA 
100  FORMAT (6H  ND  -  ,I3,7H  NZA  -  ,14) 

175  C 

150  .  CONTINUE 
RETURN 
END 
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LOGICAL  FUNCTION  EUSE(IsJ) 

C 

C  EUSE1 

C  - 

5  C 

C  ELIMINATES  ALL  OFF-DIAGONAL  ELEMENTS. 
C  USED  FOR  DIAGONAL  SCALING 
C 

EUSE*. FALSE. 

10  RETURN 

END 


LOGICAL  FUNCTION  EUSE(I,J) 

C 

C  EUSE  2 

C  - 

5  C 

C  USED  DURING  BLOCK  DIAGONAL  SCALING  (BDS) . 

C  KEEPS  ONLY  THOSE  ELEMENTS  IN  THE  TRI-DIAGONAL 
C  PORTION  OF  THE  MATRIX  A. 

C 

10  EUSE-. FALSE. 

IF  (lABS(J-I).LE.l)  EUSE* . TRUE . 

RETURN 

END 


LOGICAL  FUNCTION  EUSE(I,J) 

C 

C  EUSE3 
5  C 

C  USED  TO  GENERATE  THE  LINE  RED/BLACK  REDUCED  BLOCK  FORMAT 
C 

EUSE-. FALSE. 

ID-IABS(J-I) 

10  IF  ( (ID.LE. 1 ) .OR. (ID. EQ. 32) )  EUSE-.TRUE. 

RETURN 

END 


o  o  n  n  o  n  n  n  n  n  n  r>  n 
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LOGICAL  FUNCTION  EUSE(I,J) 
EUSE4 


USED  TO  GENERATE  THE  POINT  RED/BLACK 
REDUCED  BLOCK  MATRIX. 


EUSE-. FALSE. 

MI«(I-l/8)+l 

MJ-((J-l)/8)+l 

MD-IABS(MJ-MI) 

IF  (MD.EQ.4)  EUSE-.TRUE. 

RETURN 

END 


LOGICAL  FUNCTION  EUSE(I,J) 
EUSE5 


USED  TO  GENERATE  THE  2  LINE  RED/BLACK  REDUCED  BLOCK  FORMAT 
EUSE-. FALSE. 

IF  (IABS(J-I) .LE.8)  EUSE-.TRUE. 

10  RETURN 

END 


SUBROUTINE  MA31C(N,NZ,D,A,INI,INJ,IAI,IAJ,IK, 
1IP,IW,W>IFLAG,C) 

C 

C  MA31C  IS  PART  OF  THE  HARWELL  MA31  PACKAGE. 

5  C  SEE  ROUTINE  MA31A  FOR  DETAILS. 

C  MODIFIED  TO  CALCULATE  A  WIDER  VARIETY  OF  INCOMPLETE 
C  CHOLESKY  FACTORIZATIONS, 

C 

EXTERNAL  FILL 

10  REAL  A(IAJ) ,W(N) ,D(N) 

INTEGER  IP(N,2),OPTION(6) 

LOGICAL  CHANGE, FILL 

INTEGER  IK(N,3) ,IW(N,2) , INI(IAI) , INJ(IAJ) 
COMMON/MA3 l I /DD , LP , MP 
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1 5  COMMON /MA3 1 J/LROW , LCOL , NCP ,ND, IPD 

COMMON/MA3 IK/NURL, NUCL, NUAL 
COMMON/MCOMM3 /OPTION 
COMMON/MA3 1L/EPSTOL.U 
C 

20  C  OPTION  DETERMINES  HOW  THE  FACTORIZATION  WILL  BE  DONE 
C  OPTION ( 1 )  -  0  -  NATURAL  ORDER  FACTORIZATION 

C  -  1  -  MINIMUM  DEGREE  FACTORIZATION 

C  OPTION ( 2 )  -  0  -  FUNCTION  FILL  USED  TO  CONTROL  FILL-INS 

C  -  1  -  DROP  TOLERANCE  C  USED  TO  CONTROL  FILL-INS 

25  C  0PTI0N(3)  -  0  -  NO  DIAGONAL  SCALING  USED 

C  -  1  -  DIAGONAL  ELEMENTS  SCALED  BY 

C  1+ABS (C ) /FLOAT (N ) 

C  OPTION(4 )  -  NOT  USED  HERE 

C  0PTI0N(5)  -  0  -  DIAGONAL  MODIFICATION  NOT  CONSIDERED 

30  C  -  1  -  DIAGONAL  MODIFICATION  CORRESPONDING 

C  TO  THE  DROPPED  FILL-INS  IS  PERFORMED 

C  OPTION(6)  -  NOT  USED  HERE 

C 

C  IP(I,1),IP(I,2)  POINT  TO  THE  START  OF  ROW/COLUMN  I. 

35  C  IK(I, 1) ,IK(I,2)  HOLD  THE  NUMBER  OF  NONZEROES  IN  ROW/COLUMN  I 
C  OF  THE  LOWER  TRIANGULAR  PART  OF  A. 

C  DURING  THE  MAIN  BODY  OF  THIS  SUBROUTINE  THE  VECTORS 
C  IK(*,3),IW(*,1)  AND  IW(*,2)  ARE  USED  TO  HOLD  DOUBLY 

C  LINKED  LISTS  OF  ROWS  THAT  HAVE  NOT  BEEN  PIVOTAL  AND 

40  C  HAVE  EQUAL  NUMBER  OF  NONZEROES. 

C  IK(I,3)  HOLD  FIRST  ROW/COLUMN  TO  HAVE  I  NONZEROS  OR 
C  ZERO  IF  THERE  ARE  NONE. 

C  IW(I,1)  HOLD  ROW/COLUMN  NUMBER  OF  ROW/ COLUMN  PRIOR  TO 
C  ROW  I  IN  ITS  LIST  OR  ZERO  IF  NONE. 

45  C  IW(I,2)  HOLD  ROW/COLUMN  NUMBER  OF  ROW/COLUMN  AFTER 
C  ROW  I  IN  ITS  LIST  OR  ZERO  IF  NONE. 

C  DURING  THE  MAIN  BODY  OF  THE  SUBROUTINE  INI  AND  INJ 
C  KEEP  A  COLUMN  FILE  AND  A  ROW  FILE  CONTAINING 
C  RESPECTIVELY  THE  ROW  NUMBERS  OF  THE  NONZEROS  OF 
50  C  EACH  COLUMN  AND  THE  COLUMN  NUMBERS  OF  THE  NONZEROS 
C  OF  EACH  ROW.  THE  IP  ARRAYS  POINT  TO  THE  START 
C  POSITION  IN  INI  AND  INJ  OF  EACH  COLUMN  AND  ROW. 

C 

DATA  ZERO, ONE, CMAX/0. 0.1. 0,1. 0E20/ 

55  C 

C  INITIALIZE  IK(*,3)  AND  LOCAL  VARIABLES. 

CHANGE-. TRUE . 

IF  (C.LE.ZERO)  CHANGE-. FALSE. 

NZO-NZ 
IPD-N 

ALFA-1. 0/0. 90 
Bl  — .03 


60 


Ill 


B2-  .03 

NFILL-IAJ-NZO-N 
65  MCL-LCOL 

CO-O 

IF  (OPTION(3).NE.O)  CO-ABS(C) /FLOAT(N) 

C-C**2 
DO  5  1*1, N 

70  d(i)-(i-h:o)*d(i) 

5  IK(I,3)*0 
C 

C  SET  UP  LINKED  LISTS  OF  ROWS/COLUMNS  WITH  EQUAL  NUMBER 
C  OF  NON-ZEROS. 

75  C 

IF  (OPTION(l).NE.O)  GO  TO  9 
DO  8  1*1, N 
IW(I,1)*I-1 
IW(I,2)«I+1 
80  8  CONTINUE 

IW(N>2)*0 
IK(1 ,3)*1 
GO  TO  15 
C 

85  9  CONTINUE 

DO  10  1*1, N 
NZI-IK(I,1)+IK(I,2)+1 
IN-  IK(NZI,3) 

IK(NZI,3)»I 
90  IW(I,2)-IN 

IW(I,l)-0 

10  IF  (IN.NE.O)  IW(IN, 1 )-I 
1 5  CONTINUE 
C 

95  C  START  THE  ELIMINATION  LOOP 
DO  180  IIP-l.N 
C 

C  SEARCH  ROWS  WITH  NRJP  NONZEROS, 

DO  20  NRJP-l.N 
100  JP-IKCNRJP.3) 

IF(JP.GT.O)  GO  TO  25 
20  CONTINUE 
C 

C  ROW  JP  IS  USED  AS  PIVOT. 

105  C 

C  REMOVE  ROWS/COLUMNS  INVOLVED  IN  ELIMINATION  FROM 
C  ORDERING  VECTORS. 

C 

25  DO  45  L-1,2 

110  KPP*IP(JP,L) 
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KLL-IK(JP,L)+  KPP-1 
IF  (KPP.GT.KLL)  GO  TO  A 5 
DO  40  K-KPP.KLL 
IF  (L.EQ.2)  GO  TO  27 

115  J-INJ(K) 

GO  TO  28 

27  J-INI(K) 

28  IL-IW(J.l) 

IN-IW(J,2) 

120  IW(J,2)— 1 

IF  (OPTION(l).EQ.O)  GO  TO  40 
IF  (IN.LT.O)  GO  TO  40 
IF  (IL.EQ.O)  GO  TO  30 
IW(IL,2)*IN 

125  GO  TO  35 

30  NZ-IK(J,l)+IK(J,2)+l 
IK(NZ,3)-IN 

35  IF  (IN.GT.O)  IW(IN,1)-IL 

40  CONTINUE 

130  45  CONTINUE 

C 

C  REMOVE  JP  FROM  ORDERING  VECTORS 
IL=IW(JP,1) 

IN-IW(JP,2) 

135  IW(JP,2)  —10 

IF  (OPTION(l).NE.O)  GO  TO  54 
IK(1,3)-JP+1 
GO  TO  55 
54  CONTINUE 

140  IF  (IN.LT.O)  GO  TO  55 

NZ-IK( JP , 1 )+IK( JP ,  2  )+l 
IK(NZ,3)-IN 

IF(IN.GT.O)  IW(IN,1)-IL 

55  CONTINUE 

145  C 

C  STORE  PIVOT. 

IW(JP,  1)— IP 

C  COMPRESS  ROW  FILE  IF  NECESSARY. 

C 

150  IF(LROW+IK( JP, 1 )+IK( JP , 2 ) .GT . IAJ-N  )  C-CMAX 

IF  (NURL+IK(JP, 1 )+IK(JP, 2)  .LT.NUAL)  GO  TO  60 
CALL  MA3 ID(A,INJ, IAJ ,N,IK, IP, .TRUE.) 

60  KP-IP(JP,1) 

KL»IK(  JP, 1 )+KP- 1 

155  IP(JP,1)«NURL+1 

IF  (KP.GT.KL)  GO  TO  90 

C 

C  REMOVE  JP  FROM  COLUMNS  CONTAINED  IN  THE  PIVOT  ROW. 
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DO  85  K-KP.KL 
160  J-INJ(K) 

KPOIP(J,2) 

NZ«IK(J,2)-1 

IK(J,2)«NZ 

KLC-KPC+NZ 

165  IF  (KLC.GT.KPC)  GO  TO  65 

INI(KPC)«0 
GO  TO  80 

65  DO  70  KOKPC.KLC 

IF  (JP.EQ. INI(KC) )  GO  TO  75 
170  70  CONTINUE 

75  INI(KC)-INI<KLC) 

INKKLO-0 
80  LC0L-LC0L-1 
NURL-NURL+1 
175  INJ (NURL)-J 

A(NURL)-A(K) 

85  INJ(K)-0 
C 

C  TRANSFORM  COLUMN  PART  OF  PIVOT  ROW  TO  THE  ROW  FILE. 
180  90  KP2-IP(JP,2) 

KL2-IK(JP,2)+KP2-1 
IF  (KP2.GT.KL2)  GO  TO  100 
DO  95  K-KP2.KL2 
NURL-NURL+l 
185  LCOL-LCOL-1 

I-INI(K) 

KPR-IP(I.l) 

KLR-KPR+IK(I,1)-1 
DO  92  KR-KPR.KLR 

190  IF  (JP.EQ. INJ(KR) )  GO  TO  93 

92  CONTINUE 
93  INJ(KR)-INJ(KLR) 

A(NURL)-A(KR) 

A(KR)-A(KLR) 

195  INJ(KLR)-0 

IK(I,1)-IK(I,1)-1 
INJ(NURL)-I 
95  INIOO-O 

100  NZC-IK(JP,1)+IK(JP,2) 

200  IK(JP,1)“NZC 

IK(JP,2)«0 
C 

C  UNPACK  PIVOT  ROW  AND  CONTROL  DIAGONAL  VALUE. 
KP-IP(JP.i) 

205  KL-KP+NZC-1 

CO-EPSTOL*U 
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IF  (KP.GT.KL)  GO  TO  102 
DO  101  K-KP.KL 
AA-A(K) 

210  C0«AMAXI(C0, ABS(AA) ) 

J-INJ(K) 

W(J)-AA 

101  CONTINUE 

102  DJP-D(JP) 

215  IF  (DJP.GT.CO/U)  GO  TO  103 

IFLAG-2 

IF  (MP.GT.O)  WRITE(MP,250)  JP 

250  FORMAT ( / /44H+-  WARNING  MODIFICATION  OF  ZERO  OR  NEGATIVE, 
146H  DIAGONAL  ENTRY  HAS  BEEN  PERFORMED  IN  LOCATION, 17) 

220  D(JP)«CO 

IF  (CO.EQ.EPSTOL*U)  D(JP)=ONE 

103  IF  (KP.GT.KL)  GO  TO  179 

C 

C  PERFORM  ROW  OPERATIONS. 

225  DO  170  NC-l.NZC 

KC-IP(JP,1)+NC-1 
IR-INJ(KC) 

AL-A(KC)/D(JP) 

C 

230  C  COMPRESS  ROW  FILE  IF  NECESSARY. 

IF  (LROW+IK(IR, 1 )+IK(JP, 1 ) .GT. IAJ-N)  C-CMAX 
IF  (NURL+IK(IR,1)+IK(JP, 1) .LT.NUAL)  GO  TO  105 
CALL  MA31D(A, INJ, IAJ, N, IK, IP, .TRUE. ) 

105  KR-IP(IR.l) 

235  KRL-KR+IK(IR,1)-1 

IF  (KR.GT.KRL)  GO  TO  120 
C 

C  SCAN  THE  OTHER  ROW  AND  CHANGE  SIGN  IN  IW  FOR  EACH  COMMON 
C  COLUMN  NUMBER. 

240  DO  115  KS-KR.KRL 

J-INJ(KS) 

IF  (IW(J,2) .NE.-l)  GO  TO  115 
IW(J,2)»1 

A(KS)-A(KS)-AL*W(J) 

245  115  CONTINUE 

C 

C  SCAN  PIVOT  ROW  FOR  FILLS. 

120  DO  165  KS-KP.KL 
J-INJ(KS) 

250  C 

C  ONLY  ENTRIES  IN  THE  UPPER  TRIANGULAR  PART  ARE  CONSIDERED. 

IF  (J.LT.IR)  GO  TO  165 
IF<IW(J,2).EQ.l)  GO  TO  165 
AA-- AL*W(J) 
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255  IF(IR.NE.J)  GO  TO  122 

D(IR)-D(IR)+AA 
GO  TO  165 

122  IF  (OPTION(2 ) .NE.O)  GO  TO  123 
IF  (FILL(IR, J))  GO  TO  124 
260  IF  (OPTION(5).EQ.O)  GO  TO  165 

D(J)-D(J)+AA 
D(IR)-D(IR)+AA 
GO  TO  165 

123  IF  (AA*AA.GT.C*ABS(D(IR)*D(J) ))  GO  TO  124 

265  IF  (OPTION (5).EQ.O)  GO  TO  165 

D(J)=D(J)+AA 
D(IR)-D(IR)+AA 
GO  TO  165 

124  LROW-LROW+1 

270  IK(IR, 1 )=IK(IR,1 )+l 

C  IF  POSSIBLE  PLACE  THE  NEW  ELEMENT  NEXT  TO  THE  PRESENT  ENTRY. 

C 

C 

C  TRY  IF  THERE  IS  ROOM  AT  THE  END  OF  THE  ENTRY. 

275  IF  (KR.GT.KRL)  GO  TO  130 

IF  (KRL.EQ.IAJ)  GO  TO  125 
IF  (INJ(KRL+l).NE.O)  GO  TO  125 
KRL-KRL+1  ' 

INJ  (KRL)**J 

280  A(KRL)=AA 

GO  TO  133 
C 

C  TRY  IF  THERE  IS  ROOM  AHEAD  OF  PRESENT  ENTRY. 

125  IF  (KR.NE.NUAL)  GO  TO  126 

285  NUAL-NUAL-1 

GO  TO  127 

126  IF  (INJ(KR-l) .NE.O)  GO  TO  128 

127  KR-KR-1 
IP(IR,1)-KR 

290  INJ(KR)-J 

A(KR)-AA 
GO  TO  133 
C 

C  NEW  ENTRY  HAS  TO  BE  CREATED. 

295  128  DO  129  KK-KR.KRL 

NUAL-NUAL-1 
INJ (NUAL)-INJ (KK) 

A(NUAL)-A(KK) 

129  INJ(KK)-0 

300  C 

C  ADD  THE  NEW  ELEMENT. 

130  NUAL-NUAL-1 
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INJ (NUAL)-J 
A(NUAL)-AA 

305  IP(IR,1)-NUAL 

KR-NUAL 

KRL»KR+IK(IR,1)-1 

C 

C  CREATE  FILL  IN  COLUMN  FILE. 

310  133  NZ-IK(J,2) 

K-IP(J,2) 

KL1-K+N2-1 

LCOL-LCOL+1 

C 

315  C  IF  POSSIBLE  PLACE  NEW  ELEMENT  AT  THE  END  OF  PRESENT  ENTRY. 
IF  (NZ.EQ.O)  GO  TO  140 
IF  (KL1.EQ.IAI)  GO  TO  137 
IF  (INI(KL1+1).NE.0)  GO  TO  137 
INI(KLl+l)«IR 
320  GO  TO  160 

C 

C  IF  POSSIBLE  PLACE  ELEMENT  AHEAD  OF  PRESENT  ENTRY. 

137  IF  (K.NE.NUCL)  GO  TO  138 
IF  (NUCL.EQ. 1 )  GO  TO  140 

325  NUCL-NUCL-l 

GO  TO  139 

138  IF  (INI(K-l) .NE.O)  GO  TO  140 

139  K-K-l 
INIOO-IR 

330  IP(J,2)-K 

GO  TO  160 
C 

C  NEW  ENTRY  HAS  TO  BE  CREATED. 

140  IF  (NZ+l.LT.NUCL)  GO  TO  145 

335  C 

C  COMPRESS  COLUMN  FILE  IF  THERE  IS  NOT  ROOM  FOR  NEW  ENTRY. 

IF  (LCOL+NZ+2 . GE . IAI )  C-CMAX 

CALL  MA31D(A,INI,IAI,N,IK(1,2),IP(1,2), .FALSE.) 
K-IP(J,2) 

340  KL1-K+NZ-1 

C 

C  TRANSFER  OLD  ENTRY  INTO  NEW. 

145  IF  (K.GT.KL1 )  GO  TO  155 
DO  150  KK-K.KL1 
345  NUCL-NUCL-l 

INI(NUCL)-INIOUC) 

150  INI(KK)»0 
C 

C  ADD  THE  NEW  ELEMENT. 

350  155  NUCL-NUCL-1 
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355 


360 


365 


370 


375 


380 


385 


390 


395 


INI(NUCL)»IR 
IP(J,2)=NUCL 
160  IK(J,2)*NZ+1 
165  IW(J,2)— 1 
170  CONTINUE 
C 

C  UPDATE  ORDERING  ARRAYS. 

DO  172  K-KP.KL 
J-INJ(K) 

W(J)-0. 

A(K)»A(K)/D(JP) 

IP  (0 PTION(l).EQ.O)  GO  TO  171 
NZ-IK(J, 1)+IK(J,2)+1 
IN-IK(NZ,3) 

IW(J,2)-IN 

IW(J,1)»0 

IK(NZ,3)=J 

IF  (IN.NE.O)  IW( IN, 1 )=J 
GO  TO  172 

171  IW(J,2)-J+1 
IW(J,1)=J-1 

172  CONTINUE 

IF  (OPTION(l).EQ.O)  IW(N,2)»0 
MCL-MAXO(MCL.LCOL) 

PIVT-FLOAT(IIP) /FLOAT(N) 

C 

C  GIVE  WARNING  IF  AVAILABLE  SPACE  IS  USED  TOO  EARLY. 

IF  (C.NE.CMAX)  GO  TO  175 
IF  (IPD.LT.IIP)  GO  TO  179 
IPD-IIP 

IF  (PIVT  .GT.  .9)  GO  TO  179 
IFLAO-4 

IF  (MP.GT.O)  WRITE(MP, 260)  IIP 
GO  TO  179 

260  FORMAT( / /44H+WARNING  AVAILABLE  SPACE  USED  AT  PIVOT  STEP, 17) 
C 

C  CHANGE  C  IF  NECESSARY. 

175  IF  (.NOT.  CHANGE)  GO  TO  179 

PF ILL“F  LOAT (L  ROW-N  ZO ) /FLOAT(NFILL) 

IF  (PIVT. GT. 0.9)  GO  TO  179 
IF  (P F ILL . LT . ALF A*P I VT+B 1 )  GO  TO  176 
IF  (PFILL.LT. ALFA*P IVT+B2 )  GO  TO  179 
C-2.25*C 

176  ALFA“(1 .O-PFILL) / (0.9-PIVT) 

B1 »PFILL-PIVT*ALFA-0 . 03 
B2-B1+0.06 

C 

C  IF  THE  MATRIX  IS  FULL  THEN  STOP  THE  SPARSE  ANALYZE. 
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179  NR-N-IIP 

400  LFULL-NR* (NR- 1 ) /2 

LFULDD-IFIX(DD*FLOAT(LFULL) ) 

IF  (L COL . GE . L  FULDD . AND . NURL+LFULL . LT . I AJ )  GO  TO  185 

180  CONTINUE 
C 

405  C 

C  ELIMINATION  LOOP  TERMINATES 

C  AFTER  DEVIATION  WE  FACTORIZE  THE  REMAINING  FULL  MATRIX. 
185  IPD-IIP 
C*SQRT(C) 

410  LCOL-MCL 

IF  (.NOT.  CHANGE)  C=-C 
C 

C  THE  ORDER  OF  THE  FULL  MATRIX  IS  NR. 

C  LOOP  THROUGH  ROWS  IN  THE  ACTIVE  MATRIX  AND  STORE 
415  C  ROW  NUMBERS  IN  INI. 

KK-0 

DO  197  1*1, NR 
JP«IK(I,3) 

194  IF  (JP)196, 196, 195 
420  195  KK-KK+1 

INI(KK)-JP 

JP=IW(JP,2) 

GO  TO  194 

196  IF  (KK.EQ.NR)  GO  TO  198 
425  197  CONTINUE 

C 

C  MAKE  A  SORT  OF  ROWNUMBERS  IN  INI. 

198  IF  (NR.EQ.l)  GO  TO  200 
NRM1-NR-1 

430  DO  199  1*1 ,NRM1 

J1*I+1 

DO  199  J=J1,NR 

IF  (INI(J) .GT.INI(I) )  GO  TO  199 
JJ*INI(I) 

435  INI(I)-INKJ) 

INI(J)*JJ 

199  CONTINUE 

200  DO  201  1*1, NR 
II-INI(I) 

440  201  IW(II,1)— (IPD+I) 

C 

C  MAKE  AN  ORDERED  LIST  OF  THE  PIVOTS. 

DO  202  1*1 ,N 
IR— IW(1 , 1 ) 

445  202  IK(IR,2)*I 

C 


C  MOVE  FULL  MATRIX  TO  THE  FRONT  AND  ORDER. 
IPDPl»IPDfl 
NM1-N-1 

450  IF  (IPDP1.GT.NM1)  GO  TO  245 

DO  215  IIP-IPDP1.NM1 
JP*IK(IIP,2) 

KP«IP(JP,1) 

KL-KP+IK(JP,1)-1 

455  C 

C  MOVE  ROW  JP  TO  W. 

IF  (KP.GT.KL)  GO  TO  204 
DO  203  K-KP.KL 
J=*INJ  (K) 

460  INJ(K)«0 

203  W(J)-A(K) 

C 

C  COMPRESS  FILE  IF  NECESSARY. 

204  IF(NUEL+N-IIP.LT.NUAL)  GO  TO  205 

465  CALL  MA31D(A,INJ,IAJ,N,IK,IP, .TRUE.) 

205  IP(JP,l)«NURL-t-l 

IK(JP,1)*=N-IIP 
C 

C  MOVE  ROWS  AND  COLUMN  INDICES  INTO  PIVOTAL  ORDER. 

470  IIPP1-IIP+1 

DO  210  I-IIPP1.N 
J«IK(I,2) 

NURL-NURL+1 

A(NURL)-W(J) 

475  INJ(NURL)-J 

210  W(J)-ZERO 

215  CONTINUE 

LROW-NURL 
C 

480  C  FACTORIZE  THE  FULL  MATRIX. 

DO  240  IIP-IPDP1.NM1 
JP*IK(IIP,2) 

KPI-IP(JP,1) 

IP1-IIP+1 

485  IF  (IP1.EQ.N)  GO  TO  235 

C 

C  LOOP  THROUGH  THE  OTHER  ROW 
DO  230  J-IP1.NM1 
JJ»IK(J,2) 

490  KPJ-IP(JJ.l) 

KLJ-KPJ+IK(JJ,1)-1 

AL»A(KPI)/D(JP) 

D(JJ)-D<JJ)-AL*A(KPI) 

KK-KPI+1 
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495  DO  220  K-KPJ.KLJ 

A(K)»A(K)-AL*A(KK) 

220  KK-KK+1 
C 

C  STORE  FACTOR  AND  PROCEED  TO  NEXT  ROW. 
500  A(KPI)-AL 

KPI-KPI+1 
230  CONTINUE 
C 

C  MODIFY  LAST  DIAGONAL  ENTRY 
505  235  JJ-IK(N,2) 

AL-A(KPI)/D(JP) 

D(JJ)-D(JJ)-AL*A(KPI) 

A(KPI)-AL 
240  CONTINUE 
510  245  CONTINUE 

RETURN 
END 


LOGICAL  FUNCTION  FILL(I,J) 

C 

C  FILL1 
5  C 

C  USED  WHEN  NO  FILL-INS  ARE  TO  BE  KEPT 
C 

FILL-. FALSE. 

RETURN 

10  END 


LOGICAL  FUNCTION  FILL(I,J) 

C 

C  FILL2 

C  - 

5  C 

C  ALLOWS  ONE  DIAGONAL  OF  FILL-INS  TO  BE  KEPT 
C  ADJACENT  TO  THE  OUTER  DIAGONAL. 

C 

FILL-. FALSE. 

10  IF  (IABS(J-I) .GE.7)  FILL-. TRUE. 

RETURN 

END 
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LOGICAL  FUNCTION  FILL(I,J) 

C 

C  FILL3 

C  - 

5  C 

C  ALLOWS  THREE  DIAGONAL  OF  FILL-INS  TO  BE  KEPT. 
C  ONE  ADJACENT  TO  THE  INNER  DIAGONAL  AND  TWO 
C  ADJACENT  TO  THE  OUTER  DIAGONAL. 

C 

10  FILL-. FALSE. 

ID-IABS(J-I) 

IF  ((ID.LE.2).OR.(ID.GE.6)>  FILL-. TRUE. 

RETURN 

END 


SUBROUTINE  S0LVE(NN,NZ,A,INI,INJ,IAI,IAJ,W,IK,B,W1) 
C 

C  SUBROUTINE  WHICH  SOLVES  THE  LINEAR  SYSTEM  USING 
C  HARWELL'S  PRECONDITIONED  CONJUGATE  GRADIENT 
5  C  ROUTINE  MA31F. 

C 

C  INPUT  PARAMETERS 

c  - 

C 

10  C  NN  -  ORDER  OF  MATRIX  A. 

C  NZ  -  NUMBER  OF  NON-ZERO  ELEMENTS  IN  THE  UPPER 
C  TRIANGULAT  PORTION  OF  MATRIX  A. 

C  A  -  ARRAY  OF  LENGTH  IAJ  CONTAINING  THE  NON-ZERO 
C  OFF-DIAGONAL  ELEMENTS  OF  THE  UPPER  TRIANGULAR 

15  C  PORTION  OF  MATRIX  A  IN  THE  FIRST  NZ  LOCATIONS 

C  IN  ROW  ORDER.  LOCATIONS  NZ+1 , . . . .NZ+LROW 

C  CONTAIN  THE  NON-ZERO  OFF-DIAGONAL  ELEMENTS 

C  OF  THE  UPPER  TRIANGULAR  PORTION  OF  THE 

C  PRECONDITIONING  MATRIX  C  IN  ROW  ORDER. 

20  C  INJ  -  ARRAY  OF  LENGTH  IAJ  CONTAINING  THE  COLUMN 

C  INDICES  OF  THE  CORRESPONDING  ENTRY  IN  ARRAY  A. 

C  (IE.  INJ(K)  CONTAINS  THE  COLUMN  INDICE  FOR 

C  ENTRY  A(K),  K-l , . . . .NZ+LROW) . 

C  INI  -  ARRAY  OF  LENGTH  NZ  CONTAINING  THE  ROW  INDICES 
25  C  OF  THE  CORRESPONDING  ENTRY  IN  ARRAY  A. 

C  (IE.  INI(K)  CONTAINS  THE  ROW  INDICE  FOR 

C  ENTRY  A(K) ,  K-1,...,NZ). 

C  IAJ  -  SIZE  OF  ARRAYS  INJ  AND  A. 

C  B  -  CONTAINS  THE  RIGHTHAND  SIDE  OF  THE  SYSTEM. 

30  C  W  -  ARRAY  OF  LENGTH  3*NN  IN  WHICH  LOCATIONS 


l 
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C  1 , . . . ,NN  CONTAIN  THE  DIAGONAL  ELEMENTS  OF 

C  MATRIX  A  AND  LOCATIONS  NNfl . 2*NN  CONTAIN 

C  THE  INVERSE  OF  THE  DIAGONAL  ELEMENTS  OF  MATRIX  C. 

C  THE  REMAINING  NN  LOCATIONS  ARE  WORK  SPACE. 

35  C  W1  -  ARRAY  OF  LENGTH  3*NN  USED  AS  WORK  SPACE. 

C 

C  OUTPUT  PARAMETERS 

C  - 

C 

40  C  B  -  THE  SOLUTION  VECTOR 
C 

C  COMMON  BLOCK  PARAMETERS 
C  - 

c 

45  C  LCOL,NCP, IPD,DD  -  NOT  USED 

C  LROW  -  NUMBER  OF  NON-ZERO  ELEMENTS  IN  UPPER 
C  TRIANGULAR  PORTION  OF  THE  PRECONDITIONING 

C  MATRIX  C. 

C  ND  -  ORDER  OF  MATRIX  A  AND  C. 

50  C  LP  -  OUTPUT  FILE  DEVICE  NUMBER. 

C  MP  -  MESSAGE  FILE  DEVICE  NUMBER. 

C  MITS  -  MAXIMUM  NUMBER  OF  ITERATIONS  TO  BE  ATTEMPTED. 

C  EPS1  -  DESIRED  ACCURACY  OF  ! !R! ! 

C 

55  C  INTERNAL  VARIABLES 

c  - 

C 

C  NITER  -  ON  ENTRY  TO  MA31F  IT  CONTAINS  THE  MAXIMUM 
C  NUMBER  OF  ITERATION  TO  BE  ATTEMPTED.  ON 

60  C  RETURN  FROM  MA3LF  IT  CONTAINS  THE  NUMBER 

C  OF  ITERATIONS  PERFORMED. 

C  EPS  -  ON  ENTRY  TO  MA31F,  EPS(l)  CONTAINS  THE 
C  DESIRED  ACCURACY  FOR  HR!!.  ON  RETURN, 

C  EPS (I)  CONTAINS  THE  VALUE  OF  ! !R! !  AFTER 

65  C  ITERATION  1-1. 

C 

REAL  A(IAJ) ,W(NN,3) ,W1(NN,3) ,EPS(1 50) ,B(NN) 

INTEGER  INI(IAI) , INJ(IAJ) , IK(NN,2 ) 

C 

70  C0MM0N/MA3 1N/MITS , EPS1 

COMMON /MA3 1 I /DD , LP , MP 
C0MM0N/MA3 1 J/LROW,LCOL,NCP, ND, IPD 
C 

WRITE(MP, 5) 

75  5  FORMAT (12H  START  SOLVE) 

IAJ1-IAJ-NZ 

NZ1-NZ+1 

C 
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105 
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WRITE(LP,292) 

292  F0RMAT(3  7H0RESULTS  OF  PRECONDITIONED  CG  ROUTINE) 
IFLAG-0 
NITER-MITS 
EPS(1)-EPS1 
C 

CALL  SECOND(STRTIM) 

C 

CALL  MA31F(ND,NZ,A,W,INI,INJ,IAJ1,A(NZ1),W(1,2), 
1INJ(NZ1 ) , IK,B,W( 1,3),WI,W1(1,2),W1(1,3), 

2NITER.EPS) 

C 

CALL  SECOND(STPTIM) 

C 

NITER1-NITER+1 

IF  (EPS(NITERl) .LE.EPS1)  GO  TO  300 
WRITE(LP,295)  NITER 

295  FORMAT(20H0— WARNING  MORE  THAN,  17, 2X, 

*47HITERATI0NS  REQUIRED  TO  OBTAIN  DESIRED  ACCURACY.) 
I FLAG- 3 
C 

300  WRITE(LP, 301)  I FLAG 

301  FORMAT (20H0AFTER  MA31F  IFLAG  -  ,12) 

WRITE(LP, 305)  NITER, EPS (N ITER1 ) 

305  FORMAT (18H0NUM  ITERATIONS  -  ,I3,2X, 

*1 9HN0RM  OF  RESIDUAL  -  ,E13.5) 

RT IME -STPT IM-STRT IM 
WRITE(LP, 310)  RTIME 

310  FORMAT (12H0RUN  TIME  -  ,F7.3,4H  SEC) 

WRITE(LP, 330) 

330  FORMAT(38HONORM  OF  RESIDUAL  AFTER  EACH  ITERATION) 

DO  340  I-l.NITERl 
WRITE(LP, 335)  (I-1),EPS(I) 

335  FORMAT ( 1H  ,I3,2X,E13.5) 

340  CONTINUE 
C 

500  CONTINUE 
RETURN 
END 


i 

i 
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SUBROUTINE  MA3 IF(N,NZ ,A,D, INI, INJ, IAF ,AF,DF ,INJF ,IK, B,R, 
1  E,F, G.KMAX.EPS) 

C 

C  MA31F  IS  PART  OF  THE  HARWELL  MA31  PACKAGE. 

5  C  IT  HAS  BEEN  MODIFIED  TO: 

C  1)  HANDLE  ADDED  PARAMETER  TO  MA31H  CALLING  SEQUENCE 
C  2)  SAVE  THE  RESULTING  RESIDUAL  EACH  ITERATION 
C  3)  USE  A  RANDOM  STARTING  VECTOR. 

C  SEE  ROUTINE  MA31A  FOR  DETAILS. 

10  C 

REAL  AF(IAF) ,DF(N) ,A(NZ) , B(N) ,R(N) ,E(N) ,F(N) ,G(N) ,L,D(N) 
REAL  EPS(KMAX) 

INTEGER  INJF(IAF) ,INI(NZ) ,INJ(NZ) ,IK(N,2) 

DATA  ZERO/O.O/ 

15  C 

C  THIS  SUBROUTINE  PERFORMS  THE  ITERATIVE  PROCEDURE. 

C  THE  PRECONDITIONED  CONJUGATE  GRADIENT  METHOD  IS  USED. 
DO-ZERO 
EPS1-EPS(1)**2 

20  C 

C  COMPUTE  THE  INITIAL  SOLUTION. 

DO  10  I-1,N 
10  E(I)-RANF(I)*2.0 

CALL  MA31G(N,AF,INJF,IAF,DF,IK,E) 

25  C 

C  COMPUTE  THE  RESIDUALS  AND  INSERT  THE  INITIAL  SOLUTION  IN  B. 
CALL  MA31H(A,D,INI,INJ,NZ,N,E,R) 

Rl-ZERO 
DO  20  1*1, N 

30  R(I)*R(I)-B(I) 

R1-R1+R(I)**2 

G(I)-R(I) 

20  B(I)-E(I) 

KITR-0 

35  EPS(1)-SQRT(R1) 

IF  (R1.LT.EPS1)  GO  TO  75 
C 

C  INITIALIZE  E  AND  G. 

CALL  MA31G(N,AF,INJF ,IAF,DF,IK,G) 

40  DO  30  I-l.N 

E(I)*-G(I) 

30  D0-D0+R(I)*G(I) 

C 

C  START  ITERATION  LOOP 
45  35  KITR-KITR+1 

CALL  MA31H(A,D,INI,INJ,NZ,N,E,F) 

L-ZERO 
DO  40  I-l.N 


t 


40  L-L+E(I)*F(I) 

L-DO/L 

C 

C  AJUST  B,G  AND  R. 

Rl-ZERO 

DO  50  I-l.N 

B(I)-B(I)+L*E(I) 

R(I)-R(I)+L*F(I) 

R1»R1+R(I)*R(I) 

50  G(I)-R(I) 

EPS(KITR+1)-SQRT(R1) 

C 

C  CONTROL  THE  RESIDUAL. 

IF  (R1.LE.EPS1  .OR.  KITR.GE.(KMAX-l) )  GO  TO  75 

C 

C  PROCEED  ITERATION  . 

CALL  MA31G(N,AP, INJF , IAF , DF, IK,G) 

D 1-ZERO 
DO  60  I-l.N 
60  D1-R(I)*G(I)-H)1 

BB-D1/D0 
D0-D1 

DO  70  I-l.N 
70  E(I)— G(I)+BB*E(I) 

GO  TO  35 
C 

C  ITERATION  LOOP  TERMINATES. 

75  KMAX-KITR 
RETURN 
END 


SUBROUTINE  MA3 1H(A,D, INI, INJ.NZ ,N, B,Z) 

C 

C  MA31H  IS  PART  OF  THE  HARWELL  MA31  PACKAGE. 

C 

REAL  A(NZ),D(N),B(N),Z(N) 

INTEGER  INI(NZ) ,INJ(NZ) 

Q 

C  THIS  SUBROUTINE  CALCULATES  THE  INNER  PRODUCT  OF  A  MATRIX 
C  A  AND  A  VECTOR  B  AND  THE  RESULT  IS  RETURNED  IN  VECTOR  Z. 
C  THE  DIAGONAL  ENTRIES  OF  MATRIX  A  ARE  CONTAINED  IN  D. 

C 

C  INITIALIZE  A. 

C 
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15  DO  10  >1,N 

10  Z(I)-B(I)*D(I) 

C 

C  LOOP  OVER  THE  NON-ZEROES  IN  A. 
C 

20  IF  (NZ.LE.O)  GO  TO  100 

DO  90  K-l.NZ 
I-INI(K) 

J-INJ(K) 

Z(I)»Z(I)+A(K)*B(J) 

25  90  Z(J)-Z(J)+A(K)*B(I) 

100  RETURN 
END 


PROGRAM  PROG2( INPUT, OUTPUT .DATA, TAPE4-INPUT, TAPE6=DATA, 
*TAPE5«OUTPUT) 

C 

C  PROGRAM  TO  CALCULATE  THE  EIGENVALUES  OF  OUR 
5  C  SYMMETRICALLY  PRECONDITIONED  COEFFICENT  MATRIX 
C  USING  THE  HARWELL  EA14A  LANCZOS  ALGORITHM. 

C 

C  SUBROUTINE  GENA  AND  FACTOR 

C  - 

10  C  SEE  PROG1  FOR  DESCRIPTION 

C 

C  SUBROUTINE  GETEIG 

C  - 

C  SUBROUTINE  WHICH  CALLS  SUBROUTINE  EA14A  TO  CALCULATE 
15  C  THE  DESIRED  EIGENVALUES. 

C  MITE  -  MAXIMUM  NUMBER  OF  ITERATIONS  TO  BE  ATTEMPTED 

C  ACC  -  DESIRED  ACCURACY  OF  RESULTING  EIGENVALUES 

C  EL,ER  -  SEARCH  INTERVAL 

C 

20  C  SEE  INDIVIDUAL  SUBROUTINES  FOR  MORE  DETAILS. 

C 

REAL  A(650) ,B(64) , W(64, 3) 

INTEGER  INK200) ,INJ(650) ,IK(64,4) ,IW(64,4) ,0PTI0N(6) 

C 

25  C0MM0N/MA31I/DD,LP,MP 

COMMON /MA3 1J /LROW , LCOL , NCP , ND , IPD 
COMMON /MA3 1 K/NURL , NUCL , NUAL 
C0MM0N/MC0MM3 /OPTION 
COMMON /MA3 1N/MITE,ACC,EL, ER 
30  COMMON /MA3 1L / EPSTOL , U 

COMMON /MA3 1M/NI , N J , NVERSN , NTYPE 
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C 

EXTERNAL  FILL.EUSE 
C 

35  DATA  DD,LP,MP/1.0,6,5/ 

DATA  EPSTOL,U/2.0E-6, 1.0E2/ 

DATA  IAI,IAJ,NN/2 00, 650, 64/ 

DATA  MITE,ACC,EL,ER/600 , 1 .OE-4, 1 .0,0.0/ 

DATA  NI,NJ/8,8/ 

40  C 

ND-NN 

C 

C  GET  PARAMETERS  DETAILING  TYPE  OF 
C  PRECONDITIONING  METHOD  TO  USE. 

45  C 

READ( 4 , * )  NTYPE.NVERSN 
READ(4,*)  (OPTION(I), 1-1,6) 

READ(4,*)  C 
C 

50  CALL  GENA(NN,NZ ,A,INI, INJ, IAI.IAJ, W,B,IK, IW) 

C 

IF  <0PTI0N(6).EQ.l)  GO  TO  5 
C 

C  PERFORM  THE  DESIRED  FACTORIZATION 
55  C 

CALL  FACTOR(NN,NZ,A,INI, INJ, IAI, IAJ, IK, IW, V, C,FILL, EUSE) 
GO  TO  15 
5  CONTINUE 
C 

60  C  NO  PRECONDITIONING  REQUESTED 
C  GENERATE  IDENTITY  MATRIX 
C 
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10 
15 

70  C 

C  CALCULATE  THE  EIGENVALUES  OF  THE  PRECONDITIONED  MATRIX 
C 

CALL  GETEIG(NN,NZ , A.INI.INJ, IAI, IAJ, W, IK, B) 

C 

75  END 


LROW-O 

DO  10  I-l.NN 

IK(I,l)-0 

IK(I,2)-I 

V(I,2)-1.0 

CONTINUE 

CONTINUE 
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SUBROUTINE  GETEIG(NN,NZ ,A, INI, INJ, IAI, IAJ, W, IK) 

C 

C  SUBROUTINE  TO  CALCULATE  ALL  THE  EIGENVALUES  OF 
C  OUR  SYMMETRICALLY  PRECONDITIONED  INPUT  MATRIX 
C  USING  THE  HARWELL  EA14A  LANCZOS  ALGORITHM. 

C  SEE  SUBROUTINE  SOLVE  FOR  DESCRIPTION  OF  INPUT  PARAMETERS. 
C 

REAL  A(IAJ) , W(NN,3) 

REAL  EIG(1024) ,U(1024) ,V(1 024) ,T1 (1 024) ,T2( 1 024) 

REAL  X(3000 ) , DEL(3000 ) , ALFA(5000 ) , BETA(5000 ) 

INTEGER  INI(IAI) ,INJ(IAJ) ,IK(NN,4) 

INTEGER  NU(3000) 

C 

COMMON /E A1 4BD /P  RVT ( 4 ) , I PRVT ( 6 ) 

COMMON /MA3 1 I /D D , LP , MP 
COMMON/MA3 1 J/LROW.LCOL, NCP, ND, I PD 
COMMON /MA3 1 N /MI TE , ACC , EL , ER 
C 

DATA  LEIG,LX,LALFA/1 024, 3000, 5000/ 

C 

NZ1-NZ+1 
IAJ1-IAJ-NZ 
IFLAG— 1 
C 

C  A  MAXIMUM  OF  MITE  ITERATIONS  ARE  ATTEMPTED  TO 
C  ACQUIRE  ALL  EIGENVALUES  IN  THE  INTERVAL  EL  TO  ER 
C  TO  AN  ACCURACY  OF  ACC. 

C 

DO  30  ITER- 1, MITE 
C 

CALL  EA14AD(NN,EL,ER,ACC,LEIG,LX,LALFA,LP, IFLAG, 

*U, V.EIG.NEIG.X, DEL, NU, ALFA, BETA) 

C 

IF  (IFLAG. EQ.O)  GO  TO  200 
IF  (IFLAG. GT.l)  GO  TO  100 
C 

C  CALCULATES  VECTOR  U  -  VECTOR  U  +  MATRIX  A'  TIMES  VECTOR  V, 
C  WHERE  MATRIX  A'  IS  THE  RESULT  OF  SYMMETRICALLY 
C  PRECONDITIONING  MATRIX  A  BY  MATRIX  C. 

C 

CALL  MA3 1G2(NN, A(NZ1 ) ,INJ(NZ1 ) ,IAJl , W( 1,2) ,IK, V,T1) 

CALL  MA31H(A,W, INI,INJ,NZ ,NN,T1 ,T2 ) 

CALL  MA3 1G1(NN, A(NZ1 ) , INJ(NZ1 ) , IAJ l , W( 1 , 2 ) , IK, T2 ) 

C 

DO  20  I-l.NN 
U(I)«U(I)  +  T2(I) 

20  CONTINUE 
C 
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30  CONTINUE 

50  GO  TO  180 

C 

C  EA14AD  IS  SIGNALING  FAILURE 
C 

100  WRITE(LP.llO)  I FLAG 

55  WRITE(MP.llO)  IFLAG 

110  FORMAT( 2 6H0EA1 4AD  HAS  FAILED.  IFLAG-,12) 

GO  TO  290 
C 

C  EA14A  COULDN'T  FINISH  IN  THE  REQUESTED 
60  C  NUMBER  OF  ITERATIONS. 

C 

180  WRITE(LP, 185)  MITE 

WRITE(MP ,185)  MITE 

185  F0RMAT(3  9H0 — WARNING  ALL  EIGENVALUES  NOT  FOUND  IN, 
65  *13, 2X, 10HITERATIONS) 

ITER=MITE 

C 

C  OUTPUT  DATA  ON  THE  CALCULATED  EIGENVALUES 
C 

70  200  CONTINUE 

WRITE(LP,205)  PRVT(l) 

205  FORMAT ( 1 9H0SPECTRAL  RADIUS  =  ,E14.7) 

WRITE(LP, 215) 

215  FORMAT ( 3 OHODATA  ON  RESULTING  EIGENVALUES) 

75  WRITE (LP, 2 20)  ITER, ACC 

220  FORMAT (8H  ITER  -  ,I3,2X,6HACC  =  ,E13.5) 

WRITE (LP, 2 30)  NEIG 

230  FORMAT(28H  NUM  DISTINCT  EIGENVALUES  =  ,13) 

C 

80  DO  235  1=1, NEIG 

235  EIG(I)=EIG(I)-1.0 

C 

WRITE(LP, 240) 

240  FORMAT ( 2 5H0S TAT 1ST ICS  ON  EIG(I)-l.O) 

85  C 

CALL  DSCRPT(NEIG, 1 , 0 , EIG, XMN , STDV, VAR , SKW ,XKT , 

*0 , 0 , 0 , 5 , LP ) 

CALL  DSCRP2(NEIG, 1,1, 0,0, EIG, EIG,XMED,XMIN,XMAX, 
*RNGE,LP) 

90  C  - 

290  CONTINUE 
RETURN 
END 
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SUBROUTINE  MA3 1G1 (N, A, INJ, IAJ, D, IK, B) 

C 

C  SUBROUTINE  TO  SOLVE  A  SYSTEM  OF  EQUATIONS 
C  (L  TRANSPOSE)  (SQRT  D)  T  *  B 
5  C  BY  BACKWARD  SUBSTITUTION.  RESULT  IS  RETURNED 
C  IN  VECTOR  B.  BASED  ON  HARWELL  ROUTINE  MA31G. 
C  REMINDER,  ARRAY  A  CONTAINS  L  TRANSPOSE. 

C  SEE  MA31G  FOR  DESCRIPTION  OF  VARIABLES. 

C 

10  INTEGER  INJ(IAJ) ,IK(N,2) 

REAL  A(IAJ),D(N),B(N) 

C 

KP-1 

C 

15  DO  25  IIP*=1,N 

IC-IK(IIP,2) 

KL=“KP+IK(IC,1)-1 

BIC-B(IC) 

IF  (KP.GT.KL)  GO  TO  20 
20  DO  15  K=*KP,KL 

IR=INJ(K) 

15  B(IR)=»B(IR)-A(K)*BIC 
20  KP-KL+1 
25  CONTINUE 

25  C 

DO  30  I-l.N 
B(I)=»B(I)  /SQRT(DCI)  ) 

30  CONTINUE 
C 

30  RETURN 

END 


SUBROUTINE  MA31G2(N,A,INJ,IAJ,D,IK,B,T) 

C 

C  SUBROUTINE  TO  SOLVE  A  SYSTEM  OF  EQUATIONS 
C  (SQRT  D) (L)  T  -  B 

5  C  BY  FORWARD  SUBSTITUTION.  BASED  ON  THE  HARWELL 
C  ROUTINE  MA31G.  SEE  MA31G  FOR  DESCRIPTION  OF 
C  VARIABLES . 

C  REMINDER,  ARRAY  A  CONTAINS  L  TRANSPOSE. 

C 

10  INTEGER  INJ(IAJ) ,IK(N,2) 

REAL  A(IAJ) ,D(N) ,B(N) ,T(N) 

COMMON /MA3 1 J /LROW , LCOL , NCP , ND , IPD 
C 
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15  C 


10 

C 


20 


25 


20 

25 

30  30 

C 


KL-LROW 

DO  10  I-l.N 
T(I)«B(I)/SQRT(D(I)) 

DO  30  IPI-l.N 

IIP-N+l-IPI 

IR-IK(IIP,2) 

BIR-0.0 

KP-KL-IK(IR,1)+1 
IF  (KP.GT.KL)  GO  TO  25 
DO  20  K-KP.KL 
IC-INJ(K) 

BIR-BIR-A(K)*T(IC) 

T(IR)-T(IR)+BIR 

KL-KP-1 

CONTINUE 

RETURN 

END 
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5 


10 


15 


20 


25 


30 


35 


40 


45 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 


c 


c 


P ROGRAM  PROG 1 A(0 UTPUT , DATA , TAPES -O UTPUT , TAPE6 -DATA ) 

DRIVER  PROGRAM  USED  DURING  PHASE  III. 

DESIGNED  TO  SOLVE  THE  TEST  PROBLEMS  USING: 

A)  POINT  RED/BLACK  GRID  POINT  ORDERING  SCHEME 
(NTYPE-2) 

B)  INCOMPLETE  CHOLESKY  FACTORIZATION  W/  0  DIAGONALS 
ADDED  AS  PERFORMED  BY  SUBROUTINE  ICCGO. 

THE  FOLLOWING  PROGRAM  CHANGES  ARE  REQUIRED  BY  THE 
VARIOUS  TEST  PROBLEMS: 

1)  TEST  PROBLEM  1  (GENA1) 

NVERSN  -  1  NI  -  32  NJ  =  32 

ND  -  1024 

USE  DIMENSIONS 

B(1024) ,  W(1024, 3) ,  Wl(1024,3),  IK(1024,2) 
AND  IW(1024) 

2)  TEST  PROBLEM  2  (GENA2) 

NVERSN  -  2  NI  »  32  NJ  =  31 

ND  -  992 

USE  DIMENSIONS 

B(992) ,  W(992, 3) ,  Wl(992,3),  IK(992,2) 

AND  IWC992) 

3)  TEST  PROBLEM  3  (GENA3) 

NVERSN  -  3 

EVERYTHING  ELSE  AS  PER  PROBLEM  2 
SEE  PROGRAMS  PR0G1,  GENA,  ICCGO  AND  SOLVE  FOR  MORE 
DETAILS. 

REAL  A(5000),B(1024),W(1024,3),W1(1024,3) 

INTEGER  INI ( 2000 ) , INJ ( 5000 ) , IK ( 1 02 4 , 2 ) , IW ( 1 024 ) 

C0MM0N/MA3 1 J/LROW ,LCOL, NCP , ND, IPD 
COMMON /MA3 1N/MITS , EPS 1 
C0MM0N/MA3 1L/EPST0L.U 
COMMON /MA3 1 I /DD , LP , MP 
COMMON /MA3 1M/NI , N J , NVERSN , NTYPE 

DATA  U.EPSTOL/l .0E2, 2.0E-6/ 

DATA  MITS,EPS1/100, 1 .OE-6/ 

DATA  IAI, I AJ, ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1 .0,6,5/ 

DATA  NI.NJ/32, 32/ 

DATA  NVERSN, NTYPE/1, 2/ 

CALL  GENA(ND,NZ , A,INI, INJ, IAI, IAJ , W,B,IK, IW) 

NZ1-NZ+1 

CALL  ICCG0(ND, NZ , A, INI , INJ, A(NZ1 ) , INJ(NZ1 ) , IK, 

*IW,W( 1 , 1 ),W(1 ,2) ) 
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IAJ2-NZ+LR0W 

50  CALL  S0LVE(ND,NZ,A,INI,INJ,NZ,IAJ2,W,IK,B,W) 

C 

END 


PROGRAM  PROG  IB  (OUTPUT,  DATA,  TAPE5  OUTPUT,  TAP  E6«DATA) 

C 

C  DRIVER  PROGRAM  USED  DURING  PHASE  III. 

C  DESIGNED  TO  SOLVE  THE  TEST  PROBLEMS  USING 
5  C  A)  LINE  RED/BLACK  GRID  POINT  ORDERING  SCHEME 
C  (NTYPE  -  1) 

C  B)  CHOLESKY  FACTORIZATION  OF  THE  BLOCK  TRIDIAGONAL 
C  PORTION  OF  MATRIX  A,  AS  PERFORMED  BY  SUBROUTINE  BDIAG. 

C  SEE  PROG 1 A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF 
10  C  THE  TEST  PROBLEMS. 

C  SEE  PR0G1,  GENA,  BDIAG  AND  SOLVE  FOR  MORE  DETAILS. 

C 

REAL  A(5000) ,B(992) ,W(992, 3) ,W1 (992, 3) 

INTEGER  INI (2000 ) , INJ (5000 ) , IK(992 , 2 ) , IW(992 ) 

15  C 

COMMON/MA3 1 J/LROW ,LCOL, NCP , ND , IPD 
COMMON/MA3 1 N/MITS , EPS 1 
COMMON/MA3 1L/EPSTOL.U 
COMMON /MA3 1 I /DD , LP , MP 

20  COMMON /MA3 1M/NI , N J , NVERSN , NTYPE 

C 

DATA  U,EPSTOL/1.0E2,2.0E-6/ 

DATA  MITS , EPS 1/150,1. 0E-6/ 

DATA  IAI.IAJ, ND/2000, 5000, 992/ 

25  DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI.NJ/32, 31/ 

DATA  NVERSN, NTYPE/ 2,1/ 

C 

CALL  GENA(ND,NZ , A, INI , INJ, I AI , IAJ , W , B , IK, IW) 

30  NZ1-NZ+1 

CALL  BDIAG(ND,NZ,A,INI, INJ, A(NZ1 ) , INJ(NZ1 ) ,IK, 
*IW,W(1,1),W(1,2)) 

IAJ2=*NZ+LROW 

CALL  SOLVE(ND,NZ,A,INI, INJ.NZ ,IAJ2, W,IK, B,W1 ) 

35  C 

END 


15  C 


PROGRAM  PROG 1C (OUTPUT, DATA, TAP E5*DUTPUT,TAPE6 “DATA) 

DRIVER  PROGRAM  USED  DURING  PHASE  III. 

DESIGNED  TO  SOLVE  TEH  TEST  PROBLEMS  USING 

A)  2  LINE  RED/BLACK  GRID  POINT  ORDERING  SCHEME 
(NTYPE  -  3) 

B)  REDUCED  BLOCK  INCOMPLETE  CHOLESKY  FACTORIZATION 
WITH  0  DIAGONALS  ADDED  AS  PERFORMED  BY  RBICO. 

SEE  PROG 1A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF 
THE  TEST  PROBLEMS. 

SEE  PROG1,  GENA,  RBICO  AND  SOLVE  FOF  MORE  DETAILS. 

REAL  A(5000) , B(1024) ,W(1024, 3) ,W1 (1024, 3) 

INTEGER  INI ( 2000 ) , INJ ( 5000 ) , IK ( 1 024 , 2 ) , IW ( 1 024 ) 

C0MM0N/MA3 1 J/LROW,LCOL,NCP, ND, IPD 
COMMON /MA3 1N/MITS , EPS1 
C0MM0N/MA3 1L/EPST0L, U 
COMMON /MA3 1 I /DD, LP , MP 
C0MM0N/MA3 1M/NI , NJ, NVERSN , NTYPE 

DATA  U,EPSTOL/1.0E2,2.0E-6/ 

DATA  MITS,EPSl/100, 1 .0E-6/ 

DATA  IAI.IAJ, ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI.NJ/32, 32/ 

DATA  NVERSN, NTYPE/ 1,3/ 

CALL  GENA(NDNZ ,A,INI,INJ,IAI,IAJ,W,B,IK,IW) 

NZ1-NZ+1 

NCP-NJ 

CALL  RBICO(ND,NZ , A, INI, INJ, A(NZ1 ) , INJ(NZ1 ) ,IK, 
*IW,W( 1,1),W(1,2)) 

IAJ2-NZ+LROW 

CALL  S0LVE(ND,NZ,A,INI,INJ,NZ,IAJ2,W,IK,B,W1) 


l 
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PROGRAM  PROG 1D(0UTPUT , DATA , TAP E5 ■K) UTPUT , TAPE6 “DATA) 
C 

C  DRIVER  PROGRAM  USED  DURING  PHASE  III. 

C  DESIGNED  TO  SOLVE  THE  TEST  PROBLEMS  USING 
5  C  A)  NATURAL  GRID  POINT  ORDERING  SCHEME  (NTYPE  =■  0) 

C  B)  NO  PRECONDITIONING 

C  SEE  PROG 1A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF 
C  THE  TEST  PROBLEMS. 

C  SEE  PROG 1 ,  GENA  AND  SOLVE  FOR  MORE  DETAILS. 

10  C 

REAL  A(5000) , B(1024) ,W(1024, 3) ,W1(1024, 3) 

INTEGER  INI(2000) ,INJ (5000) ,IK( 1024, 2) ,IW( 1024) 

C 

COMMON/MA3 1 J/LROW ,LCOL , NCP , ND , I PD 
15  COMMON /MA3 1N/MITS , EPS 1 

COMMON/MA3 1L/EPSTOL.U 
COMMON/MA3 1 I /DD , LP , MP 
COMMON /MA3 1M/NI.NJ, NVERSN, NTYPE 
C 

20  DATA  U.EPSTOL/l .0E2,2.0E-6/ 

DATA  MITS,EPSl/100,1.0E-6/ 

DATA  IAI.IAJ, ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI.NJ/32, 32/ 

25  DATA  NVERSN, NTYPE/ 1,0/ 

C 

CALL  GENA( ND,NZ,A,INI,INJ,IAI,IAJ,W,B,IK,IW) 

LROW-O 

DO  10  I-l.ND 
30  IK(I,l)-0 

IK(I,2)-I 
W(I,2)-1.0 
10  CONTINUE 

C 

35  WRITE(LP, 15) 

15  FORMAT (19H  NO  PRECONDITIONING) 

IAJ2-NZ+LROW 

CALL  SOLVE(ND,NZ,A,INI,INJ,NZ,IAJ2,W,IK,B,WI) 

C 

40  END 
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SUBROUTINE  GENA(NN,NZ,A,INI,INJ,IAI,IAJ,D,B,IK,IW) 

C 

C* ************************************ *********** 

c 

5  C  GENA2 

C  - 

C 

C  PERFORMS  THE  DISCRETIZATION  OF  MODEL  PROBLEM  2. 

C  SEE  GENA1  FOR  DESCRIPTION  OF  VARIABLES. 

10  C 

C* ********************************************** 

c 

REAL  A(IAJ) , B(NN) ,D(NN) ,ATYPE(4 ) 

INTEGER  INI (IAI) , INJ ( IAJ ) , IK(NN , 2 ) , IW(NN ) 

15  C 

COMMON /MA3 1 J/LROW ,LCOL, NCP , ND, IPD 
COMMON /MA3 1 I/DD , LP , MP 
C0MM0N/MA3 1M/NI , NJ, NVERSN, NTYPE 
C 

20  DATA  ATYPE/7HNATURAL, 7HLINE  RB.8HP0INT  RB.8H2LINE  RB/ 

WRITE(MP,2) 

2  F0RMAT(11H  GENA  START) 

C 

DO  5  1*1, ND 
25  IK(I,l)-0 

IK(I,2)-0 
IW(I)-0 
5  CONTINUE 
C 

30  CALL  TIME(AT) 

CALL  DATE (AD) 

CALL  SECOND (TIM1 ) 

C 

NNAT-0 

35  NZ-0 

C 

DO  100  I-l.NI 
DO  90  J-l.NJ 
NNAT-NNAT+l 

40  n«norder(ntype,i,j,nnat) 

D(N)«4.0 

B(N)-0.0 

IF  (I.EQ.l)  D(N)»D(N)/2.0 
IF  (J.EQ.NJ)  D(N)«D(N)/2.0 
IF  (I.EQ.NI)  D(N)-D(N)/2.0 
IF  (J.NE.l)  GO  TO  10 
B(N)«1.0 

IF  ((I.EQ.l), OR. (I.EQ.NI))  B(N)-0.5 


45 


137 


10  CONTINUE 

50  C 

C  CALCULATE  INNER  DIAGONAL 
C 

IF  (J.EQ.NJ)  GO  TO  20 
NZ-NZ+1 

55  A(NZ)~ 1.0 

IF  ((I.EQ.l).OR.(I.EQ.NI))  A(NZ)— 0.5 
NT-NORDER(NTYPE , I , J+l , NNAT+1 ) 

CALL  ISTORE(N , NT , INI , INJ , IAI , IK, ND, NZ) 

20  CONTINUE 

60  C 

C  CALCULATE  OUTER  DIAGONAL 
C 

IF  (I.EQ.NI)  GO  TO  90 
NZ-NZ+1 

65  A(NZ)— 1.0 

IF  (J.EQ.NJ)  A(NZ)— 0.5 
NT-NORDER(NTYPE , 1+1 , J , NNAT4NJ ) 

CALL  ISTORE(N,NT,INI, INJ, AI, IK, ND,NZ) 

90  CONTINUE 
70  100  CONTINUE 

C 

C  INITIALIZE  IW(I)  TO  POINT  JUST  BEYOND  WHERE  THE 
C  LAST  COMPONENT  OF  ROW  I  WILL  BE  STORED 
C 

75  KI-1 

DO  200  I-l.ND 
KI-KI+IK(I,1) 

200  IW(I)-KI 
C 

80  C  REORDER  BY  ROWS  USING  IN-PLACE  SORT  ALGORITHM 
C 

CALL  MA31E(INI,INJ,NZ,IW,ND, A) 

C 

C  REINITIALIZE  IW(I)  TO  POINT  TO  THE  BEGINNING  OF  ROW  I 
85  C 

KK-1 

DO  210  IR-l.ND 
IW(IR)«KK 

210  KK-KK+I K( I R , 1 ) 

90  DO  220  I-l.NZ 

220  INI(I)-IABS(INI(I) ) 

C 

CALL  SEC0ND(TIM2) 

TIMD-TIM2-TIM1 


95  C 


WRITE(LP,250)  TIMD 
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250 

FORMAT ( 1 3H  GENA  TIME  -  ,F6.3,4H  SEC) 
WRITE(LP,260)  NVERSN 

260 

FORMAT (11H  VERSION  -  ,12) 

100 

WRITE(LP,265)  ATYPE(NTYPE+1) 

265 

FORMAT (14H  MATRIX  A  HAS  ,A10,9H  ORDERING) 
WRITE(LP,270)  AD, AT 

270 

FORMAT (18H  DATE  GENERATED  -  ,A10,A10) 
WRITE(LP,280)  ND,NZ 

105 

280 

FORMAT (6H  ND  -  ,I4,6H  NZ  -  ,14) 

WRITE(MP, 290) 

290 

C 

FORMAT (9H  GENA  END) 

RETURN 

no 

END 

SUBROUTINE  GENA(NN , NZ , A, INI , INJ , I AI , I AJ , D , B , IK, IW) 

C 

C* ****************************************** ******** 

c 

5  C  GENA3 

C  - 

C 

C  PERFORMS  THE  DISCRETIZATION  OF  MODEL  PROBLEM  3. 

C  SEE  GENA1  FOR  DESCRIPTION  OF  VARIABLES. 

10  C 

Q*************************************************** 

c 

REAL  A(IAJ) ,B(NN) ,D(NN) ,ATYPE(4) 

INTEGER  INI ( IAI ) , INJ ( IAJ ) , IK(NN , 2 ) , IW(NN ) 

15  C 

COMMON/MA3 1 J/LROW.LCOL, NCP, ND, IPD 
COMMON/ADATA/NT , NV , AD , AT 
COMMON/MA3 1 I/DD, LP, MP 
COMMON /MA3 1M/NI , NJ , NVERSN, NTYPE 

20  C 

DATA  ATYPE/7HNATURAL, 7HLINE  RB.8HPOINT  RB.8H2LINE  RB/ 
WRITE(MP,2) 

2  FORMAT(l 1H  GENA  START) 

C 

25  DO  5  I-l.ND 

IK(I,l)-0 
IK(I,2)-0 
IW(I)-0 
5  CONTINUE 

30  C 


CALL  TIME (AT) 

CALL  DATE (AD) 

CALL  SECOND(TIMl) 
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40 

C 

45  C 
C 

50 


C 

55 

C 

60  C 


NNAT-0 

NZ-0 

H-l. 0/31.0 
HD2-H/2.0 
H2»2.0*H 
HSQ«H*H 

X— H 

XP1--HD2 

XP1SQ«XP1*XP1 

DO  95  1*1, NI 

XS1-XP1 

XS1SQ-XP1SQ 

XP1*XS1+H 

XP1SQ*XP1*XP1 

X-X+H 

XSQ-X*X 

Y*0.0 
YP1-HD2 
CYP*EXP(X*YP 1 ) 

DO  95  J*1 ,NJ 

Y*Y+H 

YS1-YP1 

YPl-YSl+fl 

YSQ-Y*Y 

NNAT-NNAT+1 

N»NORDER(NTYPE , I , J , NNAT) 

AXS-XSISQfYSQH.O 
AXP-XP1SQ+YSQ+1.0 
CYS-CYP 
CYP*EXP(X*YP 1) 

GXY-4.0*YSQ*(XSQfYSQH.0)+6.0*Y 

GXY-XSQ*GXY+2 . 0*Y* (YSQ+i . 0 ) 

GXY- 1 . 0-GXY-XSQ* (XSQ+X) *EXP(X*Y ) 
GXY-EXP(XSQ*Y)*GXY 
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80 


85 


90 


95 


100 


105 


110 


115 


120 


125 


C 


C 


C 


C 

10 

15 


20 


C 

25 


C 

30 


C 

35 


40 


D (N ) -AXS+AXP+C YS+CYP+HSQ 
B(N)«HSQ*GXY 

IF  (I.EQ.l)  GO  TO  25 
IF  (I.EQ.NI)  GO  TO  50 

IF  (J.EQ.l)  GO  TO  10 
IF  (J.NE.NJ)  GO  TO  15 

D(N)=(D(N)+H2*CYP) /2 .0 

B(N)-(B(N)4fl2*CYP*EXP (XSQ)* ( 1 .O+XSQ) ) / 2.0 
NZ-NZ+1 

A(NZ)— AXP/2.0 
GO  TO  20 

B(N)=B(N)+CYS 
NZ-NZ+1 
A(NZ)-- CYP 

NT-NORDER(NTYPE , I , J+l , NNAT+1 ) 

CALL  ISTORE (N, NT , INI , INJ , IAI , IK, ND, NZ ) 

NZ-NZ+1 

A(NZ)— AXP 

NT-NORDE (NTYPE , 1+ 1 , J , NNAT+NJ ) 

CALL  ISTORE ( N , NT , INI , INJ , IAI , IK, ND , NZ ) 

GO  TO  95 

D(N)-D(N)/2.0 
B(N)-B(N)/2.0 
IF  (J.NE.l)  GO  TO  30 
B(N)-B(N)+CYS/2.0 
GO  TO  35 

IF  (J.NE.NJ)  GO  TO  35 
Tl-H2*CYP/4.0 
D(N)-D(N)/2.0+Tl 
B(N)-B(N) /2.0+T1 
NZ-NZ+1 

A(NZ)— AXP/4.0 
GO  TO  40 

NZ-NZ+1 

A(NZ)-- CYP/2.0 

NT-NORDER(NTYPE ,  I ,  J+l ,  NNAT+ 1 ) 

CALL  ISTORE (N, NT , INI , INJ , IAI , IK, ND, NZ ) 
NZ-NZ+1 

A(NZ)— AXP/2.0 

NT-NORDER(NTYPE , 1+1 , J , NNAT+NJ ) 

CALL  ISTORE(N,NT,INI,INJ,IAI,IK,ND,NZ) 


GO  TO  95 
C 

50  IF  (J.EQ.NJ)  GO  TO  55 
D(N)-D(N)/2.0 

B(N)»B(N)+4.0*H*Y*EXP(Y)*AXP 

if  (j.eq.1)  b(n)-b(n)-k:ys 

B(N)-B(N)/2.0 

NZ-NZ+1 

A(NZ)  — CYP/2.0 

NT=*NORDER(NTYPE ,  I ,  J+l ,  NNAT+1 ) 

CALL  ISTORE(N,NT,INI,INJ,IAI,IK,ND,NZ) 

GO  TO  95 
C 

55  D(N)-(D(N)+H2*CYP)/4.0 

B  (N  ) -B  (N  )  /4 . 0+H*EXP  ( 1 . 0  ) *  (  AXP+CYP  ) 

C 

95  CONTINUE 
C 

C  INITIALIZE  IW(I)  TO  POINT  JUST  BEYOND  WHERE  THE 
C  LAST  COMPONENT  OF  ROW  I  WILL  BE  STORED 
C 

KI-1 

DO  200  1-1, ND 
KI=KI+IK(I, 1 ) 

200  IW(I)»KI 
C 

C  REORDER  BY  ROWS  USING  IN-PLACE  SORT  ALGORITHM 
C 

CALL  MA3IE(INI,INJ,NZ,IW,ND, A) 

C 

C  REINITIALIZE  IW(I)  TO  POINT  TO  THE  BEGINNING  OF  ROW  I 
C 

KK-1 

DO  210  IR-l.ND 
IW(IR)-KK 

210  KK«KK+IK(IR,1) 

DO  220  I-l,NZ 
220  INI(I)-IABS(INI(I)) 

C 

CALL  SEC0ND(TIM2) 

TIMD-TIM2-TIM1 

C 

WRITE(LP,250)  TIMD 

250  FORMAT ( 1 3H  GENA  TIME  -  ,F6.3,4H  SEC) 

WRITE(LP,260)  NVERSN 
260  FORMAT  '  1 1H  VERSION  -  ,12) 

WRITE(LP,265)  ATYPE(NTYPE+1 ) 

265  FORMAT (14H  MATRIX  A  HAS  ,A10,9H  ORDERING) 
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175  WRITE (LP, 270)  AD, AT 

270  FORMAT( 1 8H  DATE  GENERATED  =  ,A10,A10) 
WRITE(LP ,280)  ND.NZ 
280  F0RMAT(6H  ND  -  ,I4,6H  NZ  -  ,14) 
WRITE(MP,290) 

180  290  F0RMAT(9H  GENA  END) 

C 

RETURN 

END 


C 
C 
C 
C 
C 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

35  C 


10 


15 


20 


25 


30 


SUBROUTINE  ICCG0(NN,NZA,A,INI , INJ, C.INJC ,IK, IW,DA,DC) 

SUBROUTINE  TO  CALCULATE  THE  INCOMPLETE  CHOLESY 
FACTORIZATION  WITH  ZERO  FILL-IN  OF  THE  INPUT 
MATRIX  A. 

INPUT  PARAMETERS 


NN  -  ORDER  OF  MATRIX  A 

NZA  -  NMBER  OF  NON-ZERO  ELEMENTS  IN  THE  UPPER 
TRIANGULAR  PORTION  OF  MATRIX  A 
A  -  ARRAY  CONTAINING  THE  NON-ZERO  ELEMENTS  IN  THE 
UPPER  TRIANGULAR  PORTION  OF  MATRIX  A  IN  ROW 
ORDER 

INI/INJ  -  ARRAYS  CONTAINING  THE  ROW/ COLUMN  INDICES 
OF  THE  CORRESPONDING  ENTRY  IN  ARRAY  A. 
(IE.  INI(I)  AND  INJ(I)  CONTAIN  THE  ROW 
AND  COLUMN  INDICE  FOR  ENTRY  A(I) ) 

IK(I , 1 )  -  CONTAINS  THE  NUMBER  OF  ELEMENTS  IN 
ARRAY  A  BELONGING  TO  ROW  I 
IW(I)  -  POINTS  TO  THE  START  OF  ROW  I  IN  ARRAY  A 
DA  -  ARRAY  CONTAINING  THE  DIAGONAL  ELEMENTS  OF 
MATRIX  A 

OUTPUT  PARAMETERS 


C  -  ARRAY  CONTAINING  THE  NON-ZERO  ELEMENTS  IN  THE 
UPPER  TRIANGULAR  PORTION  OF  THE  INCOMPLETE 
CHOLESKY  FACTORIZATION 

INJC  -  ARRAY  CONTAINING  THE  COLUMN  INDEX  OF  THE 
CORRESPONDING  ENTRY  IN  ARRAY  C 
DC  -  ARRAY  CONTAINING  THE  DIAGONAL  ELEMENTS  IN 
THE  INCOMPLETE  CHOLESKY  FACTORIZATION 
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40 


45 


50 


55 


60 


65 


70 


75 


80 


C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 


c 

c 


2 

3 

C 


c 


5 

C 


IK(I,1)  -  NUMBER  OF  NON-ZERO  ELEMENTS  IN  ROW  I 
OF  THE  INCOMPLETE  FACTORIZATION 
IK(I,2)  -  USED  BY  OTHER  HARWELL  ROUTINES  TO 

IDENTIFY  THE  ORDER  IN  WHICH  THE  ROWS 
WERE  PROCESSED.  IN  THIS  CASE,  ROWS 
PROCESSED  IN  NATURAL  ORDER  AND 
IK(I,2)  =  I 

COMMON  BLOCK  PARAMETERS 


DD,LCOL,NCP, IPD  -  NOT  USED 
LP  -  OUTPUT  FILE  UNIT  NUMBER 
MP  -  MESSAGE  FILE  UNIT  NUMBER 

LROW  -  NUMBER  OF  NON-ZERO  ELEMENTS  IN  THE  UPPER 
TRIANGULAR  PORTION  OF  THE  INCOMPLETE 
FACTORIZATION 
ND  -  ORDER  OF  MATRIX  A 

EPSTOL  -  MINIMUM  SIZE  FOR  DIAGONAL  ELEMENT 
U  -  PARAMETER  USED  TO  DETERMINE  WHEN  A  DIAGONAL 
ELEMENT  MUST  BE  MODIFIED  TO  INSURE  POSITIVE 
DEFINITENESS 

INTEGER  IK(NN, 2 ) , IW(NN) , INI(NZA) , INJ(NZA) , INJC(NZA) 
REAL  A(NZA) ,DA(NN) ,DC(NN) ,C(NZA) 

COMMON/MA3 1 1 /D  DLP , MP 

COMMON/MA3 1 J /LROW , LCOL, NCP , ND , IPD 

COMMON /MA3 1L/EPSTOL.U 

CALL  SECOND(TIMl) 

WRITE(MP,2) 

FORMAT (12H  START  ICCGO) 

WRITE(LP, 3) 

FORMAT ( 2 6H  PRECONDITIONING  =  ICCG(O)) 

IDC-0 

CT-EPSTOL*U 

IRC-0 

DO  5  K-1,ND 
DC(K)-DA(K) 

DO  100  IROW-l.ND 
IRS-IW(IROW) 

IRE«IRSfIK(IROW,l)-l 

IK(IROW,1)«0 


IK( IROW , 2 ) “IROW 
C 

C  DETERMINE  IF  DIAGONAL  ELEMENT  MUST  BE  MODIFIED 
C  TO  PRESERVE  POSITIVE  DEFINITENESS 
C 

COCT 

IF  (IRS. GT. IRE)  GO  TO  20 
DO  10  K-IRS.IRE 
10  CO“AMAXl(CO, ABS(A(K) ) ) 

20  IF  (DC (IROW) .GT. (CO/U))  GO  TO  30 
IDC-IDC+1 
DC(IROW)«CO 

IF  (CO.EQ.CT)  DC(IR0W)“1.0 
30  CONTINUE 
C 

C  PROCESS  ELEMENTS  IN  CURRENT  ROW 
C 

IF  (IRS. GT. IRE)  GO  TO  100 
DO  90  IR=IRS , IRE 
I-INI(IR) 

J-INJ(IR) 

IRC-IRC+1 

T»A(IR) 

C(IRC)“T/DC(IROW) 

INJC(IRC)=J 

DC(J)=DC(J)-T*C(IRC) 

IK( IROW , 1 )-IK( IROW ,  1 )+ 1 
90  CONTINUE 
C 

100  CONTINUE 
C 

LROW“IRC 

CALL  SEC0ND(TIM2) 

TIMD-TIM2-TIM1 

C 

C  OUTPUT  STATISTICS 
C 

WRITE(LPf 110)  TIMD 

110  FORMAT (14H  ICCGO  TIME  »  ,F6.3,5H  SECS) 
WRITE(LP, 1 20)  LROW 
120  FORMAT (8H  LROW  -  ,14) 

IF  (IDC.NE.O)  WRITE(LP, 130)  IDC 
130  FORMAT (4H  **  ,I4,19H  DIAGONALS  MODIFIED) 
WRITE(MP, 140) 

140  FORMAT (1 OH  ICCGO  END) 

C 

RETURN 

END 
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40 


45 


SUBROUTINE  BDIAG(NN,NZA,A,INI, INJ, C.INJC , IK, IW,DA,DC) 
C 

C  SUBROUTINE  TO  CALCULATE  THE  CHOLESKY  FACTORIZATION 
C  OF  THE  TRI-DIAGONAL  PORTION  OF  THE  INPUT  MATRIX  A. 

C 

C  SEE  SUBROUTINE  ICCGO  FOR  DESCRIPTION  OF  PARAMETERS. 

C 

REAL  A(NZA) ,DA(NN) ,DC(NN) , C(NZA) 

INTEGER  IK(NN,2) , IW(NN) , INI (NZA) , INJ (NZA) , INJC (NZA) 

C 

C0MM0N/MA3 1I/DD.LP, MP 
C0MM0N/MA3 1 J/LROW ,LCOL, NCP , ND, IPD 
COMMON /MA3 1L/EPST0L.U 
C 

CALL  SECOND (TIM1 ) 

C 

WRITE(MP, 2) 

2  FORMAT (12H  START  BDIAG) 

WRITE(LP, 3) 

3  FORMAT ( 3 7H  PRECONDITIONING  -  BLOCK  TRI-DIAGONAL) 

C 

IDC-0 

CT=*EPSTOL*U 

IRC-0 

C 

DO  5  K=*1,ND 
5  DC(K)=DA(K) 

C 

DO  100  IROW-l.ND 
IRS-IW(IROW) 

IRE»IRJrt-IK(IROW,l)-l 
IK(IROW,1)»0 
IK( I ROW , 2 ) “I ROW 
C 

C  DETERMINE  IF  DIAGONAL  ELEMENT  MUST  BE  MODIFIED 
C  TO  PRESERVE  POSTIVE  DEFINITENESS 
C 

CO-CT 

IF  (IRS .GT. IRE)  GO  TO  20 
DO  10  K-IRS.IRE 
10  COAMAX1(CO,  ABS(A(K)  )  ) 

20  IF  (DC(IROW) .GT . (CO/U) )  GO  TO  30 
IDC-IDC+1 
DC(IROW)-CO 

IF  (CO.EQ.CT)  DC(IR0W)-1.0 
30  CONTINUE 
C 

C  PROCESS  ELEMENTS  IN  THE  CURRENT  ROW 
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C 

50  IF  (IRS .GT. IRE)  GO  TO  100 

DO  90  IR-IRS.IRE 
I-INI(IR) 

J-INJ(IR) 

IF  ((J-I).GT.l)  GO  TO  90 
55  IRC-IRC+1 

T-A(IR) 

C(IRC)«T/DC(IROW) 

INJC(IRC)«J 

DC(J)»DC(J)-T*C(IRC) 

60  IK(IROW,l)»IK(IROW,l)+l 

90  CONTINUE 

100  CONTINUE 
C 

LROW-IRC 

65  CALL  SECOND (TIM2 ) 

TIMD-TIM2-TIM1 

C 

C  OUTPUT  STATISTICS 
C 

70  WRITE(LP, 1 10)  TIMD 

110  FORMAT (14H  BDIAG  TIME  »  ,F6.3,5H  SECS) 
WRITE(LP, 120)  LROW 
120  FORMAT ( 8H  LROW  -  ,14) 

IF  (IDC.NE.O)  WRITE(LP, 130)  IDC 
75  130  FORMAT (4H  **  ,I4,19H  DIAGONALS  MODIFIED) 

WRITE(MP, 140) 

140  FORMAT (1 OH  BDIAG  END) 

C 

RETURN 

80  END 


SUBROUTINE  RBICO(NN,NZA,A,INI,INJ,C,INJC,IK,IW,DA,DC) 
C 

C  SUBROUTINE  TO  CALCULATE  THE  INCOMPLETE  CHOLESKY 
C  FACTORIZATION  OF  THE  QUINT-DIAGONAL  PORTION  OF 
5  C  THE  INPUT  MATRIX  A.  IT  IS  ASSUMED  THAT  THE 
C  2  LINE  RED/BLACK  ORDERING  OF  GRID  POINTS  WAS 
C  USED  IN  GENERATING  MATRIX  A. 

C 

C  SEE  SUBROUTINE  ICCGO  FOR  DESCRIPTION  OF  PARAMETERS 
10  C 

C  NCP  -  DISTANCE  FROM  MAIN  DIAGONAL  TO  OUTER  MOST 
C  DIAGONAL  TO  BE  INCLUDED  IN  THE  INCOMPLETE 
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C  FACTORIZATION. 

C 

15  REAL  A(NZA) ,DA(NN) ,DC(NN) , C(NZA) 

INTEGER  IK(NN , 2 ) , IW(NN) , INI (NZA) , INJ (NZA) , INJC(NZA) 
C 

COMMON/MA3 1I/DD.LP, MP 
COMMON /MA3 1 J /LROW , LCOL , NCP , ND , IPD 
20  COMMON /MA3 1 L/EPSTOL , U 

C 

CALL  SECOND(TIMl) 

C 

WRITE (MP, 2) 

25  2  FORMAT (12H  START  RBICO) 

WRITE(LP, 3) 

3  FORMAT(26H  PRECONDITIONING  *  RBIC(O)) 

C 

IDC-0 

30  CT-EPSTOL*U 

IROO 
C 

DO  5  K-l.ND 
5  DC(K)-DA(K) 

35  C 

DO  100  IROW* 1 , ND 
IRS-IW(IROW) 

IRE-IRS«K(IROW,  1  )-l 
IK(IR0W,1)«0 

40  IK(IR0W,2)»IR0W 

C 

C  DETERMINE  IF  DIAGONAL  ELEMENT  MUST  BE  MODIFIED 
C  TO  PRESERVE  POSITIVE  DEFINITENESS 
C 

45  CO-CT 

IF  (IRS .GT. IRE)  GO  TO  20 
DO  10  K*IRS,IRE 
10  CO'AMAX1(CO, ABS(A(K) )) 

20  IF  (DC(IROW).GT.(CO/U))  GO  TO  30 
50  IDC-IDC+1 

DC(IROW)«CO 

IF  (CO.EQ.CT)  DC(IROW)«1.0 
30  CONTINUE 
C 

55  C  PROCESS  ELEMENTS  IN  CURRENT  ROW 
C 

IF  (IRS .GT. IRE)  GO  TO  100 
DO  90  IR-IRS,IRE 
I-INI(IR) 

J-INJ(IR) 


60 
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IF  ((J-I).GT.NCP)  GO  TO  90 

IROIRC+1 

T-A(IR) 

C(IRC)-T/DC(IROW) 

65  INJC(IRC)«J 

DC(J)-DC(J)-T*C(IRC) 
IK(IROW,l)»IK(IROW,l)+l 
90  CONTINUE 
100  CONTINUE 

70  C 

LROW-IRC 

CALL  SEC0ND(TIM2) 

TIMD-TIM2-TIM1 

C 

75  C  OUTPUT  STATISTICS 
C 

WRITE(LP.llO)  TIMD 

110  FORMAT ( 1 4H  RBICO  TIME  «  ,F6.3,5H  SECS) 
WRITE(LP, 120)  LROW 
80  120  FORMAT (8H  LROW  -  ,14) 

IF  (IDC.NE.O)  WRITE(LP,  130)  IDC 
130  FORMAT (4H  **  ,I4,19H  DIAGONALS  MODIFIED) 
WRITE(MP, 140) 

140  FORMAT ( 1 OH  RBICO  END) 

85  C 

RETURN 

END 
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PROGRAM  PROG  2A(OUTPUT, DATA, TAPE50UTPUT,TAPE6“DATA) 

C 

C  DRIVER  PROGRAM  USED  DURING  PHASE  III. 

C  DESIGNED  TO  FIND  THE  EXTREME  EIGENVALUES  OF  THE 
5  C  SYMMETRICALLY  PRECONDITIONED  TEST  MATRICES. 

C  IT  USES : 

C  A)  POINT  RED/BLACK  GRID  POINT  ORDERING  SCHEME  (NTYPE  -  2) 
C  B)  INCOMPLETE  CHOLESKY  FACTORIZATION  WITH  0  DIAGONALS 
C  ADDED,  AS  PERFORMED  BY  SUBROUTINE  ICCGO. 

10  C  SEE  PROG 1A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF 
C  THE  TEST  PROBLEMS. 

C  SEE  PR0G2,  GENA,  ICCGO  AND  GETEG2  FOR  MORE  DETAILS. 

C 

REAL  A(5000) ,B(1024) ,W( 1024, 3) , W1 ( 1024, 3) 

15  INTEGER  INK2000) ,INJ(5000) ,IK(1024,2) ,IW(1024) 

C 

C0MM0N/MA3 1 J/LROW ,LCOL , NCP , ND , IPD 
COMMON /MA3 1N/MITE , ACC , EL, ER 
C0MM0N/MA3 1L/EPST0L.U 
20  COMMON  /MA31 1  /DD.LP.MP 

C0MM0N/MA3 1M/NI,NJ, NVERSN, NTYPE 
C 

DATA  U.EPSTOL/l .0E2.2.0E-6/ 

DATA  MITE,ACC,EL,ER/750, i.0E-2,0.0, 1 .2/ 

25  DATA  IAI,IAJ,ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI.NJ/32, 32/ 

DATA  NVERSN, NTYPE/ 1,2/ 

C 

30  CALL  GENA(ND, NZ , A, INI , INJ, I AI , I AJ , W , B , IK, IW) 

NZl-NZ+l 

CALL  ICCGO(ND,NZ ,A,INI,INJ, A(NZ1 ) ,INJ(NZ1 ) ,IK, IW , 
*W(l,l),W(l,2)) 

IAJ2»NZ+LR0W 

35  CALL  GETEG2(ND,NZ,A,INI,INJ,NZ,IAJ2,W,IK) 

C 

END 
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PROGRAM  PROG2B(OUTPUT,DATA,TAPE5-OUTPUT,TAPE6»DATA) 

C 

C  DRIVER  PROGRAM  USED  DURING  PHASE  III. 

C  DESIGNED  TO  FIND  THE  EXTREME  EIGENVALUES  OF  THE 
C  SYMMETRICALLY  PRECONDTIONED  TEST  MATRICES. 

C  IT  USES : 

C  A)  LINE  RED/BLACK  GRID  POINT  ORDERING  SCHEME  (NTYPE  -  1) 

C  B)  CHOLESKY  FACTORIZATION  OF  THE  BLOCK  TRI-DIAGONAL  PORTION 

C  OF  MATRIX  A. 

C  SEE  PROG 1 A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF 
C  THE  TEST  PROBLEMS. 

C  SEE  PR0G2,  GENA,  BDIAG  AND  GETEG2  FOR  MORE  DETAILS. 

C 

REAL  A(5000) ,B(1024) ,W(1024, 3) ,W1 (1024,3) 

INTEGER  INI(2000) ,INJ (5000) ,IK( 1024, 2) ,IW( 1024) 

C 

C0MM0N/MA3 1 j/LROW,LCOL,NCP, ND, IPD 
COMMON /MA3 1N/MITE , ACC , EL , ER 
COMMON /MA3 1L/EPST0L.U 
COMMON /MA3 1 I /DD , LP , MP 
COMMON /MA3 1M/NI,NJ,NVERSN, NTYPE 
C 

DATA  U , EPSTOL/ 1 . 0E2 , 2 . OE-6/ 

DATA  MITE, ACC, EL, ER/7 50, 1 .0E-2,0.0 , 1 .2/ 

DATA  IAI.IAJ.ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI.NJ/32, 32/ 

DATA  NVERSN, NTYPE /I , 1/ 

C 

CALL  GENA(ND,NZ ,A,INI,INJ, IAI, IAJ, W,B,IK, IW) 

NZ1-NZ+1 

CALL  BDIAG(ND, NZ , A, INI , INJ, A(NZ1 ) , INJ(NZ1 ) , IK, IW , 
*W(1,1),W(1,2)) 

IAJ2-NZ+LR0W 

CALL  GETEG2(ND,NZ,A,INI,INJ,NZ,IAJ2,W,IK) 

C 

END 
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PROGRAM  PROG2C (0 UTPUT , DATA , TAPES  O UTPUT , TAPE6 -DATA) 

C 

C  DRIVER  PROGRAM  USED  DURING  PHASE  III. 

C  DESIGNED  TO  FIND  THE  EXTREME  EIGENVALUES  OF  THE 
5  C  SYMMETRICALLY  PRECONDITIONED  TEST  MATRICES. 

C  IT  USES : 

C  A)  2  LINE  RED/BLACK  GRID  POINT  ORDERING  SCHEME  (NTYPE  -  3) 
C  B)  REDUCED  BLOCK  INCOMPLETE  CHOLESKY  FACTORIZATION  WITH 
C  0  DIAGONALS  ADDED  AS  PERFORMED  BY  SUBROUTINE  RBICO. 

10  C  SEE  PROG 1A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF  THE 
C  TEST  PROBLEMS. 

C  SEE  PR0G2,  GENA,  RBICO  AND  GETEG2  FOR  MORE  DETAILS. 

C 

REAL  A(5000) ,B( 1024) , W( 1024, 3) ,W1 (1024, 3) 

15  INTEGER  INI(2000) ,INJ(5000) ,IK( 1024,2) , IW(1024 ) 

C 

C0MM0N/MA3 1 J/LROW,LCOL,NCP,ND, IPD 
COMMON /MA3 1 N /MITE , ACC , EL , ER 
C0MM0N/MA3 1L/EPST0L.U 
20  COMMON  /MA3 1 1  /DD ,  LP ,  MP 

COMMON /MA3 1M/NI.NJ,  NVERSN ,  NTYPE 
C 

DATA  U.EPSTOL/l .0E2, 2.0E-6/ 

DATA  MITE, ACC, EL, ER/1 500, 1 .OE-2, 1 .0,0.0/ 

25  DATA  IAI.IAJ, ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI.NJ/32, 32/ 

DATA  NVERSN, NTYPE/ 1,3/ 

C 

30  CALL  GENA(ND,NZ ,A,INI,INJ, IAI, IAJ, W,B,IK, IW) 

NZ1-NZ+1 

NCP-NJ 

CALL  RBICO (ND, NZ , A, INI , INJ, A(NZ1 ) , INJ(NZ1 ) , IK, IW , 
*W(1,1),W(1,2)) 

35  IAJ2-NZ+LROW 

CALL  GETEG2(ND,NZ ,A,INI, INJ,NZ,IAJ2, W,IK) 

C 

END 
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PROGRAM  PR0G2D(0UTPUT , DATA.TAPE5  OUTPUT , TAP E6 -DATA) 
C 

C  DRIVER  PROGRAM  USED  DURING  PHASE  III. 

C  DESIGNED  TO  FIND  THE  ESTREME  EIGENVALUES  OF  THE 
C  SYMMETRICALLY  PRECONDITIONED  TEST  MATRICES. 

C  IT  USES: 

C  A)  NATURAL  GRID  POINT  ORDERING  SCHEME  (NTYPE  -  0) 

C  B)  NO  PRECONDITIONING 

C  SEE  PROG 1 A  FOR  PROGRAM  SET-UP  ASSOCIATED  WITH  EACH  OF 
C  THE  TEST  PROBLEMS. 

C  SEE  PR0G2,  GENA  AND  GETEG2  FOR  MORE  DETAILS. 

C 

REAL  A(5000) ,B(1024) ,W( 1024, 3) ,W1 (1024, 3) 

INTEGER  INI (2000 ) , INJ (5000 ) , I ( 104 , 2 ) , IW( 1024 ) 

C 

COMMON /MA3 1 J/LROW ,LCOL, NCP , ND , IPD 
COMMON /MA3 IN /MITE , ACC , EL , ER 
COMMON /MA3 1L/EPSTOL.U 
COMMON /MA3 1 I/DD, LP, MP 
COMMON/MA3 1M/N I , N J , NVERSN , NTYPE 
C 

DATA  U.EPSTOL/l .0E2, 2.0E-6/ 

DATA  MITE,ACC,EL,ER/750, 1 .0E-2,0.0, 1 .2/ 

DATA  IAI, I AJ, ND/2000, 5000, 1024/ 

DATA  DD,LP,MP/1.0,6,5/ 

DATA  NI,NJ/32,32/ 

DATA  NVERSN,NTYPE/1 ,0/ 

C 

CALL  GENA(ND, NZ , A, INI , INJ, IAI , I AJ , W , B , IK, IW) 

LROW-O 

DO  10  I-l.ND 
IK(I,l)-0 
IK(I,2)-I 
W(I,2)-1.0 
10  CONTINUE 
C 

WRITE(LP, 15) 

15  FORMAT(19H  NO  PRECONDITIONING) 

IAJ2-NZ+LROW 

CALL  GETEG2 (ND ,NZ,A,INI,INJ,NZ,IAJ2,W,IK) 

C 

END 
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SUBROUTINE  GETEG2 (NN , NZ , A , IN I , IN J , IAI , IAJ , W , IK) 

C 

C  SUBROUTINE  TO  CALCULATE  THE  HIGH  AND  LOW  ORDER 
C  EIGENVALUES  OF  OUR  SYMMETRICALLY  PRECONDITIONED 
5  C  INPUT  MATRIX  USING  THE  HARWELL  EA14A  LANCZOS 
C  ALGORITHM.  SEE  SOLVE  FOR  DESCRIPTION  OF  INPUT 
C  PARAMETERS. 

C 

REAL  A(IAJ) ,W(NN,3) 

10  REAL  EIG(1024) ,U(1024) ,V(1024) ,T1(1024) ,T2(1024) 

REAL  X(3000 ) ,DEL(3000) , ALFA(5000) ,BETA(5000 ) 
INTEGER  INI(IAI) , INJ(IAJ) ,IK(NN,4) 

INTEGER  NU(3000) 

C 

15  COMMON/EA14BD/PRVT(4) ,IPRVT(6) 

C0MM0N/MA3 1 I /DD , LP , MP 
C0MM0N/MA3 1 J/LR0W,LC0L,NCP, ND, IPD 
COMMON /MA31N/MITE , ACC , EL, ER 
C 

20  DATA  LEIG,LX,LALFA/1024, 3000, 5000/ 

C 

NZ1-NZ+1 
IAJ1-IAJ-NZ 
IFLAG— 1 

25  CALL  SECOND (TIM1 ) 

C 

C  PASS  1  CALCULATES  THE  EIGENVALUES  IN  THE  INTERVAL 
C  EL  TO  ER  AS  SPECIFIED  BY  THE  CALLING  ROUTINE. 

C  PASS  2  CALCULATES  THE  HIGH  ORDER  EIGENVALUES  USING 
30  C  ESTIMATED  NORM  OF  THE  MATRIX  PRODUCED  BY  ROUTINE 

C  EA14A  TO  DEFINE  THE  INTERVAL  TO  BE  EXAMINED. 

C 

DO  290  IPASS-1,2 
C 

35  WRITE(MP,10)  IPASS 

10  FORMAT (6H  PASS  ,I1,6H  START) 

C 

CALL  SEC0ND(TIM2) 

C 

40  C  A  MAXIMUM  OF  MITE  ITERATIONS  ARE  ATTEMPTED  TO 

C  ACQUIRE  ALL  EIGENVALUES  IN  THE  INTERVAL  EL  TO  ER 
C  TO  AN  ACCURACY  OF  ACC. 

C 

DO  30  ITER- 1, MITE 

45  C 

CALL  EA1 4AD(NN , EL , ER , ACC , LEIG, LX, LALFA ,LP , I FLAG, 
*U, V,EIG,NEIG,X, DEL, NU, ALFA, BETA) 

C 
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IF  (IFLAG.EQ.O)  GO  TO  200 
IF  (IFLAG.GT.l)  GO  TO  100 
C 

C  CALCULATES  VECTOR  U  -  VECTOR  U  +  MATRIX  A'  TIMES  VECTOR  V, 
C  WHERE  MATRIX  A'  IS  THE  RESULT  OF  SYMMETRICALLY 
C  PRECONDITIONING  MATRIX  A  BY  MATRIX  C. 

C 

CALL  MA31G2(NN,A(NZ1),INJ(NZ1),IAJ1,W(1,2),IK,V,T1) 
CALL  MA31H(A,W,INI,INJ,NZ,NN,T1,T2) 

CALL  MA31G1(NN, A(NZ1 ) ,INJ(NZ1 ) , IAJ1, W( 1,2) ,IK,T2) 

C 

DO  20  1*1, NN 
U(I)«U(I)  +  T2(I) 

20  CONTINUE 
C 

30  CONTINUE 
GO  TO  180 
C 

C  EA14AD  IS  SIGNALING  FAILURE 
C 

100  WRITE(LP, 1 10)  IFLAG 
WRITE(MP.llO)  IFLAG 

110  FORMAT ( 2 6H0EA14AD  HAS  FAILED.  IFLAG-, 12) 

GO  TO  290 
C 

C  EA14AD  COULDN'T  FINISH  IN  THE  REQUESTED 
C  NUMBER  OF  ITERATIONS 
C 

180  WRITE(LP, 185)  MITE 
WRITE (MP, 185)  MITE 

185  FORMAT(39HO — WARNING  ALL  EIGENVALUES  NOT  FOND  IN, 

*14 , 2X,  10HITERATI0NS ) 

ITER-MITE 

C 

C  OUTPUT  DATA  ON  THE  CALCULATED  EIGENVALUES 
C 

200  CONTINUE 

CALL  SEC0ND(TIM3) 

TRUN-TIM3-TIM2 
WRITE(LP, 202)  IPASS.TRUN 

202  FORMAT (6H  PASS  ,I1,12H  RUN  TIME  -  ,F10.3,5H  SECS) 
WRITE(LP,205)  PRVT(l) 

205  FORMAT ( 1 9H0SPECTRAL  RADIUS  -  ,E14.7) 

WRITE(LP,210)  EL.ER 

210  FORMAT (1 9H  INTERVAL  EXAMINED  ,E13.5,3H  -  ,E13.5) 

WRITE (LP  215) 

215  FORMAT ( 3 OHOD ATA  ON  RESULTING  EIGENVALUES) 

WRITE(LP,220)  ITER, ACC 
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220  FORMAT (8H  ITER  -  ,I3,2X,6HACC  -  ,E13.5) 

WRITE(LP,230)  NEIG 

230  FORMAT ( 2 8H  NUM  DISTINCT  EIGENVALUES  -  ,13) 

100  C 

WRITE(LP,235) 

235  FORMAT ( 2 8HOO RDERED  LIST  OF  EIGENVALUES) 

WRITE (LP, 240)  (EIG(I),I«1,NEIG) 

240  F0RMAT(1X, 10E13. 5) 

105  WRITE(MP,245)  IPASS 

245  F0RMAT(6H  PASS  ,11, 5H  DONE) 

C 

C  PERPARE  TO  EXAMINE  EIGENVALUES  AT  THE  END  OF  THE  SPECTRUM. 
C 

110  EL»PRVT(1)*0.8 

ER-PRVT(1)*1.1 
C 
C 

290  CONTINUE 

115  C 

CALL  SEC0ND(TIM4) 

TT-TIM4-TIM1 
WRITE(LP, 295)  TT 

295  F0RMAT(18H  TOTAL  RUN  TIME  -  ,F10.3,5H  SECS) 

120  C 

RETURN 

END 
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